IdentityResolution
inCyberDefense
Shlomo Yona,
Director, Applied Algorithmic Research
F5
2018, January 9, Big Data Analyticsmeetup#2
© 2018 F5 Networks
Credit goes to
• Maydan Wienreb
• Nina Freydz-Peleg
© 2018 F5 Networks
Shlomo Yona
s.yona@f5.com
053-7326360
shlomo@mathematic.ai
https://www.linkedin.com/in/shlomoyona
• HQ: Seattle
• Founded 1996,
IPO 1999 (FFIV)
• TLV is F5’s 3rd
largest R&D
center
• Over 4,300
employees
• 69 locations in
36 countries
© 2018 F5 Networks
© 2018 F5 Networks
In this talk
•Context
•The problem
•A solution
•Technical deep dive
•Summary
© 2018 F5 Networks
Monitor
Analyze
Update Rules
Repeat
Security
Information and
Event Management
© 2018 F5 Networks
Monitor
Analyze
Update Rules
Repeat
Use machines to handle machines
Surface
important and interesting
attack alerts
as
clear, explainable
and actionable
intents of actors
© 2018 F5 Networks
Image credit: M SAFII MAINIAL
Risk Engine
Identity
ID Tag
Geolocation
Transaction
Threat Score
What’s a Risk Engine?
Load
Balancing
DDoS
Protection
Mobile Security
Application
Security
Data Center
Firewall
Access
Security
Raw Traffic
Malware Detection
Anti-Fraud
Threat
Intelligence
© 2018 F5 Networks
Unsupervised Stacking of MoE
DDoS
Protection
Mobile Security
Application
Security
Data Center
Firewall
Access
Security
Raw Traffic
Models and Data Stores
Application Graph Other Data-stores Identity Graph
Feeds
Attackers
Feeds
Bots
Feeds
ReputationFeeds
Identities
Feeds
Transactions
Feeds
Score
APIs
Experts
Aggregator
Analytics
Location
Feeds
Forensics and
Drill Down
Malware Detection
and Anti-Fraud
© 2018 F5 Networks
13
Resolve identity Inferring action
Building and utilizing context
Deducing identity’s intent
© 2018 F5 Networks
• An additional security layer
• Abstracted actions
• Identity Resolution
• Behavioral analysis
• Consult many experts, use data science to reduce FP
• Machine Learning to automate analysis, insights
• Trust (GUI) and then Automate (APIs)
• Construct confidence metrics and rank output
• Consume information from multiple sources, open APIs
• Highly Aligned, Loosely Coupled, Rely on Open Source, DevOps,
MicroServices, Deploy Often, Use Live Customer Data
© 2018 F5 Networks
Approach to solve problems
15
Resolve identity Inferring action
Building and utilizing context
Deducing identity’s intent
© 2018 F5 Networks
Identity Resolution
The problem
Unidentified
User
Authenticated
User
NAT
Same person across devices
Identity Theft
Multiple users on same
device
© 2018 F5 Networks
False Positives
Image credit: https://mikewilliamsonvalidation.wordpress.com/2014/12/03/false-positives-in-statistical-process-control/
False Negative
Cookies
or
IP addresses
are
NOT people
Identity Graph
Image credit www.signal.co
Our digital
identity is
defined across
devices
and by behavior
Image credit liveramp.com
What’s an identity graph?
© 2018 F5 Networks
Example
YONA
122.221.212.2
122.221.212.1
7905402525
5FB1A26E4B
C422AEF54E
B4
USER
#28
© 2018 F5 Networks
Relations and connections
© 2018 F5 Networks
Source IP address:
2001:888:197d:0:250:fcff:fe23:3879
Cookie:
958a6632d2e07a438a3745621b899fec
Explicit Identity Keys
©2018F5Networks
Browser fingerprint:
14800c8c-2eef-4fc6-a56c-5aa80611bc4c
Network fingerprint:
c565a6d3-521a-451a-a0ae-30c3740917a5
Device fingerprint:
94445398-0d09-46f0-87fd-16acee995d01
Derived Identity Keys
©2018F5Networks
Behavior Identity Keys
©2018F5Networks
Co-Location Identity Keys
©2018F5Networks
Deep Dive
© 2018 F5 Networks
Match Scores
Frecency (Frequency-Recency)
Cleanups
Optimizations
© 2018 F5 Networks
Retrieve or Create an actor
• IDa
• IDb
• IDc Lookup
• IDa, IDb, IDc
• IDa, IDb
• IDa, IDc
• IDb, IDc
Support
&
Resist
Results
Create New Actor
Existing Actor
© 2018 F5 Networks
Match Score
MatchScore =
𝑆𝑢𝑝𝑝𝑜𝑟𝑡𝑊𝑒𝑖𝑔ℎ𝑡
𝑆𝑢𝑝𝑝𝑜𝑟𝑡𝑊𝑒𝑖𝑔ℎ𝑡+𝑅𝑒𝑠𝑖𝑠𝑡𝑊𝑒𝑖𝑔ℎ𝑡
© 2018 F5 Networks
Support
Support =
f(|𝐼𝐷𝐾 𝑇𝑦𝑝𝑒𝑠 𝐴𝑐𝑡𝑜𝑟
∩ 𝐼𝐷𝐾 𝑇𝑦𝑝𝑒𝑠 𝑀𝑒𝑠𝑠𝑎𝑔𝑒
|)· 𝑖=1
|𝐼𝐷𝐾|
𝐼𝑆𝑉 𝑇𝑦𝑝𝑒 𝑖𝑑𝑘 𝑖
|𝐴𝐴 𝑖𝑑𝑘 𝑖
|
Resist
Resist =
f(|𝐼𝐷𝐾 𝑇𝑦𝑝𝑒𝑠
𝐴𝑐𝑡𝑜𝑟
∩ 𝐼𝐷𝐾 𝑇𝑦𝑝𝑒𝑠 𝑀𝑒𝑠𝑠𝑎𝑔𝑒
|)· 𝑖=1
|𝐼𝐷𝐾|
𝐼𝑅𝑉 𝑇𝑌𝑃𝐸 𝑖𝑑𝑘 𝑖
|𝐼𝐷𝐾 𝑇𝑦𝑝𝑒 𝑖𝑑𝑘 𝑖 𝐴𝑐𝑡𝑜𝑟
|
Frecency
© 2018 F5 Networks
Challenges
© 2018 F5 Networks
Thank You!
s.yona@f5.com
053-7326360
shlomo@mathematic.ai
https://www.linkedin.com/in/shlomoyona

Identity Resolution

Editor's Notes

  • #5 At F5, our mission is based on the fact that businesses depend on apps. Whether it’s apps that help connect businesses to their customers or apps that help employees do their jobs—we make sure apps are always available and secure, anywhere. The world’s largest enterprises, service providers, financial and educational institutions, government entities, and consumer brands rely on F5 to stay ahead of security, cloud, and mobility trends.
  • #7 Context: a common problem and a common solution The identity problem A solution Some technical deep dive Summary
  • #8 A fundamental property of the domain is traffic that is by all standards benign/legitimate when observed out-of-context. We don’t only observe a protocol or a message but look at a process, look at behavior Multi stage attacks, constantly evolving A transaction/request of an attack is legitimate Identity: What makes an actor? Even when alerts are triggered there’s a high FP rate Too much data and too many alerts for SOC to address Write and maintain rules Hard to tell what is more important and what’s actionable Weak cross product and cross component collaboration Too slow to deliver updates and upgrades
  • #9 A fundamental property of the domain is traffic that is by all standards benign/legitimate when observed out-of-context. We don’t only observe a protocol or a message but look at a process, look at behavior Multi stage attacks, constantly evolving A transaction/request of an attack is legitimate Identity: What makes an actor? Even when alerts are triggered there’s a high FP rate Too much data and too many alerts for SOC to address Write and maintain rules Hard to tell what is more important and what’s actionable Weak cross product and cross component collaboration Too slow to deliver updates and upgrades
  • #10 A fundamental property of the domain is traffic that is by all standards benign/legitimate when observed out-of-context. We don’t only observe a protocol or a message but look at a process, look at behavior Multi stage attacks, constantly evolving A transaction/request of an attack is legitimate Identity: What makes an actor? Even when alerts are triggered there’s a high FP rate Too much data and too many alerts for SOC to address Write and maintain rules Hard to tell what is more important and what’s actionable Weak cross product and cross component collaboration Too slow to deliver updates and upgrades
  • #11 A fundamental property of the domain is traffic that is by all standards benign/legitimate when observed out-of-context. We don’t only observe a protocol or a message but look at a process, look at behavior
  • #12 Incoming feeds can arrive from F5’s products, 3rd parties and any combination We are not talking here (necessarily) about banks, credit cards or Ecommerce So what are we talking about? We are taking a risk engine approach to consolidate and to put into boarder context various data points: raw, processed, security related and other. Given a transaction or item of any sort and a wealth of immediate, historical and predictive information, evaluate its risk threat and be able to notify and to explain. We’re talking about traffic processing and application usage in general.
  • #13 Raw traffic, parsed (structured) traffic, signals from security devices are all fed into a dynamic, self-organizing and self-calibrating committee of experts (heuristics). Each expert generates its own anomaly model and view of the world and applies its own heuristics (can be very naïve or simple or very complicated) by processing some of the incoming data as well as the dynamic data stores. The plurality of opinions is fed into an aggregator which determines an intent and a score. Resulting in: bad actor feeds and other reputation feeds Analytics Forensics Drill Downs
  • #18 Users may be unidentified (anonymized, 1st timers on the application, malicious…) Users may be authenticated We may have same user across devices, networks, locations, systems, … We may have multiple users on the same device or behind same IP address, … Even authenticated users may not be who we believe they are,… for example, due to identity theft
  • #19 Fales Positives is a serious issue in security systems (think of car alarms…) You mistakenly blame an attack on an actor. Perhaps one actor using same Source IP address or Device Fingerprint reflects on others with same identity keys
  • #20 You miss a malicious actor Think the same malicious actor being missed as it changes devices (or just their fingerprints) or across networks, …
  • #21 Neither fingerprinting (device, browser, TCP stack, …)
  • #24 Any two identity keys on the same message  relation Additional relations may be based on: Similar//identical behavior Co-location Census Marketing//Landing-page form …
  • #27 Let’s imagine a message//transaction//API call//…whatever It contains some identity keys which are raw and part of the protocol Similarly, this can be concrete details of a person (gender, name, address, credit card number, email address, username+password,…)
  • #28 Some data can be collected and processed and be represented as fingerprinting to represent other types of identifiers Can think of additional things that may be calculated such as location from source IP address, or GPS, or whatever (wifi, beacons, …)
  • #29 Give example on identical moves (e.g., bot actions, use web scraping or brute force login as examples)
  • #30 Co-Location: By GPS By beaconing By Wifi By Networks …
  • #33 Add new user/update user A message arrives with a set of identity keys. First we lookup to see if these are of a known actor or of a new one: Create identity key combinations: powerset minus subsets with less than two members. Rank combinations by "combination-strength“ (cardinality and priorities: some combinations are more useful than others) Lookup actor by combination using exact matching. For each result: we calculate its "match" score. (this is where we use the "resist" and "support" weights from each identity key) Given a pre-configured match-score-threshold: prune results with match scores below the threshold. If no result left – create a new actor and attach appropriate identity keys to it If one result – the actor was found – return it Else, it means that more than one actor and depending on scoring one or more mergers of actors into one (additional work should be done on even logs accordingly) The operation of splitting an actor into several actor is currently not an online process -- this may involve offline operations for local community detection strategies.
  • #34 The general idea is that Identity Keys have two forces acting on them with respect to Actors. We score higher Identity Keys with larger Support and smaller Resist.
  • #35 For each Identity Key in the combination at hand we retrieve the attached Actors.
  • #36 For each Identity Key Type there is an initial support value (ISV) which is determined by another process. Every Actor has various Identity Keys attached to it. We use AA(idk) to represent the actors attached to an identity key.
  • #37 For each Identity Key Type there is an initial resist value (IRV) which is determined by another process. Every Actor has various Identity Keys attached to it. We use AA(idk) to represent the actors attached to an identity key. | 𝐼𝐷𝐾 𝑇𝑦𝑝𝑒 𝑖𝑑𝑘 𝑖 𝐴𝑐𝑡𝑜𝑟 | is the number of identity key values of type 𝑖𝑑𝑘 𝑖 that are attached to the Actor. 𝐼𝑅𝑉 𝑇𝑦𝑝𝑒 𝑖𝑑𝑘 𝑖 is the Initial Resist Value of the type of Identity Key i
  • #38 Every identity key contains a TTL. Some very short lived and some have longer duration. Identity keys that their TTL expires are being removed in a cleanup invoked periodically and upon triggers.
  • #39 Data integrity Threshold calibration Hyper-parameter tuning (e.g., TTLs Unsupervised Low tolerance to false positives Explainability Scalability, in particular, caching (exact vs. fuzzy) Visualization