SlideShare a Scribd company logo
1 of 35
G R A P H I S T R Y info@graphistry.com
G R A P H I S T R Y
Scaling Visual Graph Investigations with Math, GPUs, and Experts
GraphThePlanet, San Francisco, 2020
Leo Meyerovich, CEO
@LMeyerov
G R A P H I S T R Y info@graphistry.com
Tech
Security, anti-fraud, networking, …
Analysts, devs, & researchers
100X Investigations:
Graph, viz, GPUs, workflow acceleration
Users
G R A P H I S T R Y info@graphistry.com3
Graph the planet by solving logs
• 1K – 1M devices
• 1K – 1B users
• All logged: Payments, logins, clicks, ...
• Super rich metadata: IP, time, …
• Stored in many independent DBs/APIs
GRAPH
• Scope
• History & root cause
• Impact
• Patterns & outliers
• …
G R A P H I S T R Y info@graphistry.com
Three scaling advances for graph-aware investigations
Math
Hypergraphs, virtual graphs,
& ML-driven linking
Compute
GPUs for everyone!
Experts
Collaborative low-code automation
G R A P H I S T R Y
G R A P H I S T R Y info@graphistry.com
IP=10.16.0.8; msg=Malware.Object;
time=2 Nov 2017 19:32:00 UTC;
vendor=FireEye; Product=Web MPS NX
5
Unify all data by modeling logs as graphs
G R A P H I S T R Y info@graphistry.com
Pick entity cols for nodes Linked when same Event
event
Fetch logs
(ex: api result)
Modeling 1/5: Map all logs as hypergraphs
Simple UI: Column picker for any
Splunk, Neo4j, etc. query result
IP in 2 events
event
G R A P H I S T R Y info@graphistry.com
Modeling 2/5: Look across all DBs/APIs with virtual graph queries
10.0.0.1
Alert
Alerts DB
(Splunk)
10.0.0.2
Accounts DB
(SQL)
10.0.0.2
User2
Account Takeover
(ZenDesk)
LM LMeyer
G R A P H I S T R Y info@graphistry.com
Modeling 2/5: Look across all DBs/APIs with virtual graph queries
10.0.0.1
Alert
10.0.0.2 10.0.0.2
User2
search_splunk(x)
LM LMeyer
search_splunk(x)
search_sql(x)
search_sql(x)
Alerts DB
(Splunk)
Accounts DB
(SQL)
Account Takeover
(ZenDesk)
Materialize on-demand: no actual graph
DB!
G R A P H I S T R Y info@graphistry.com
Modeling 3/5: Queries are nasty, generate w/ UI + automation!
Checks more data sources Tracks more clues In less time
Generated query for 1 Splunk pivot call
G R A P H I S T R Y info@graphistry.com
Modeling 4/5: Graph algorithms to highlight events & entities
Auto-clusters
into 4 different
behavioral
groups
Pumped accts &
messages have
high degree,
high centrality
Twitter-based mass phishing
scam
Alerts across IT perimeter
User clusters
inside company
Smart layout splits
out perimeter crossings
G R A P H I S T R Y info@graphistry.com
UMAP: ML likes dates, $, counts, … which graphs don’t…
@leland_mcinnes
G R A P H I S T R Y info@graphistry.com
Modeling 5/5: … Use ML to infer neighbors & add them!
Tensorflow+UMAP
White: Link by k-nn on model
Blue: Link entities as usual
Regular graph analytics on merged graph
G R A P H I S T R Y info@graphistry.com
Three scaling advances for graph-aware investigations
Math
Hypergraphs, virtual graphs,
& ML-driven linking
Compute
GPUs for everyone!
Experts
Collaborative low-code automation
G R A P H I S T R Y
G R A P H I S T R Y info@graphistry.com
Scaling viz helps reveal correlations + work through dirty data
G R A P H I S T R Y info@graphistry.com
Client/Cloud CPU: Moore’s law is dead
Client/Cloud GPU: Steady perf doublings & price drops 🤩
Flipping from “Graphistry is weird sci-fi” to “best & most affordable solution”
G R A P H I S T R Y info@graphistry.com
GPU Democratization 1/2
2014
Graphistry NSF:
GPU Dataframes SBIR
2016/2017
Apache Arrow
+ Nvidia, BlazingSQL, …
2018/2019
RAPIDS:
Databricks, Ursa, …
Shared data format,
GPU docker, …
Graphistry first RAPIDS-
native viz stack: it’s ready!
GPU client <>GPU server:
any browser!
G R A P H I S T R Y info@graphistry.com
G R A P H I S T R Y
Graphistry Cloud:
Get an account and go!
• Open graph data network:
free!
• Developer embedding API
• Data scientist notebook API
• (AWS Price drop: 5X!)
Rest of 2020: Explore more
things & more easily!
GPU Democratization 2/2
G R A P H I S T R Y info@graphistry.com
Three scaling advances for graph-aware investigations
Math
Hypergraphs, virtual graphs,
& ML-inferred edges
Compute
GPUs for everyone!
Experts
Collaborative low-code automation
G R A P H I S T R Y
G R A P H I S T R Y info@graphistry.com
Putting the Team into Blue Team: Collaboration tech
Share Configs
Data schemas generated and shared across community:
“AWS logs settings”
Automate without the Python & Docker
• Enable regular analysts to automate their
investigations via record & replay
• ... => build up team arsenal to cover all data types and
all investigation types
Integrate with other investigation tools
Embed viz into others apps
 launch investigation templates from them (ex: User 360)
 jump from event/entity to original tool / query (ex: Splunk)
Explore
G R A P H I S T R Y info@graphistry.com
G R A P H I S T R Y
Graphistry Cloud:
Get an account and go!
• Open graph data network:
free!
• Developer embedding API
• Data scientist notebook API
Thanks!
info@graphistry.com
G R A P H I S T R Y info@graphistry.com
backup
G R A P H I S T R Y info@graphistry.com
Management perspective: 80/20 rule for covering functional KPIs
80% of DATA
endpoint logs & alerts
user logs & alerts
server logs & alerts
network logs & alerts
service logs & alerts
ticket APIs
…
80% of INCIDENTS
malware
phishing
cloud tenant breach
app server takeover
device theft
offboarding
…
80% of TASKS
high-fidelity quick check
investigative deep dive
mitigation/containment/report
table top training
automation
...
Overdue to make investigation structured & predictable!
• Incident SLA
• Investigation depth (burnout!)
• Satellite team methodology
• …
G R A P H I S T R Y info@graphistry.com
Collective automation:
Record-and-replay
investigation templates!
2. Auto-expand virtual graph
G R A P H I S T R Y info@graphistry.com
GPUs unlocking fast data @ scale for every step of your data pipeline
24
1 GPU w/ 1+ GB RAM
Dedicated
16+ GPU per node w/ 500GB+ RAM
Shared
1+ MB/s
Big & fast data
pushdown
Database
STREAMING WEBGL GRAPHICS
OPTIMIZED NETWORKING
GRAPH & TABULAR ANALYTICS
© 2018 Graphistry, Inc. All rights reserved. Confidential and proprietary information. Do not distribute. info@graphistry.com
G R A P H I S T R Y info@graphistry.com
Graph reveal non-local stats on connected data (= all digital logs!)
© 2018 Graphistry, Inc. All rights reserved. Confidential and proprietary information. Do not distribute. info@graphistry.com | 25
Scoping
Patterns & Outliers Influence & Critical Players
Progression & Behavior
G R A P H I S T R Y info@graphistry.com
RAPIDS UMAP layout
Tensorflow categorization
Graphistry visual analytics
Splunk data lake
regular review
potential illicit activity
potential trafficking
41K Reviews => 400 flagged
G R A P H I S T R Y info@graphistry.com
Graph: Top 5 most suspicious co’s,
their records, and hits on their metadata
Explainable & key entities *pop*
Graph for correlating entities across events
G R A P H I S T R Y info@graphistry.com
Correlated macro view better than disconnected alerts & tickets!
DEMO: 1w of FireEye HX over 546 IPs & 22 users
G R A P H I S T R Y info@graphistry.com
Quickly popping insights
Color by time, data source Expand 2 hops Expand by community
Color by rank, btwness, … Visual data cleaning Model tuning
G R A P H I S T R Y info@graphistry.com
100X Compute:
GPUs for everyone
What if we could easily compute over full datasets in subsecond?
G R A P H I S T R Y info@graphistry.com
Hunting:
Finally possible to do 1M+ events/entities w/ web UIs!
Ex: Bro/Zeek
(secrepo.com)
G R A P H I S T R Y info@graphistry.com
Faster Speeds, Real-World Benefits
cuIO/cuDF –
Load and Data Preparation cuML - XGBoost
Time in seconds (shorter is better)
cuIO/cuDF (Load and Data Prep) Data Conversion XGBoost
Benchmark
200GB CSV dataset; Data prep includes
joins, variable transformations
CPU Cluster Configuration
CPU nodes (61 GiB memory, 8 vCPUs, 64-
bit platform), Apache Spark
DGX Cluster Configuration
5x DGX-1 on InfiniBand
network
8762
6148
3925
3221
322
213
End-to-End
my_gdf.groupby([‘src_ip’,’dest_ip’])[‘time’].plot()
G R A P H I S T R Y info@graphistry.com
cuGraph
Multi-GPU PageRank Performance
PageRank portion of the HiBench benchmark suite
HiBench Scale Vertices Edges CSV File
(GB)
# of GPUs PageRank for
3 Iterations (secs)
Huge 5,000,000 198,000,000 3 1 1.1
BigData 50,000,000 1,980,000,000 34 3 5.1
BigData x2 100,000,000 4,000,000,000 69 6 9.0
BigData x4 200,000,000 8,000,000,000 146 12 18.2
BigData x8 400,000,000 16,000,000,000 300 16 31.8
Graph().add_edges(my_df).pagerank()
G R A P H I S T R Y info@graphistry.com
graph = netflow_df.sql(“““
SELECT
sum(bytes),
min(time),
max(time)
GROUP BY src_ip, dest_ip
”””)
graphistry.plot(graph)
BlazingSQL’s C++ skips cuDF’s Python Numba JIT…
so _great_ for subsecond interactivity!
G R A P H I S T R Y info@graphistry.com
Closing remarks: Scaling graph _projects_
Avoid failure to launch by avoiding infra & NIH:
1d-1mo: Cloud, viz, on-the-fly compute, notebooks, API connectors
3mo-never: Graph DB, Kafka ingest, Hadoop, on-prem, custom analytics, custom UIs
Useful by design: Make user+problem #1 driver, not infra
Win ROI politics w/ cupcake principle: Big projects start as small projects
Lower switching costs by augmenting vs. replacing
Everyone used to status quo and uninterested in avoidable work..
Start w/ good champions: Ideally innovative, influential, technical, & has time
grow from there
Gartner: “85% of data science projects fail.”

More Related Content

Similar to Scaling graph investigations with Math, GPUs, & Experts

Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018TigerGraph
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetGraph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetTigerGraph
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTJames Chittenden
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...TigerGraph
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017Joshua Patterson
 
NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...
NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...
NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...Amazon Web Services
 
DSDT Meetup January 2018
DSDT Meetup January 2018DSDT Meetup January 2018
DSDT Meetup January 2018DSDT_MTL
 
Dsdt meetup-january2018
Dsdt meetup-january2018Dsdt meetup-january2018
Dsdt meetup-january2018JDA Labs MTL
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Sri Ambati
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)Amazon Web Services Korea
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraph-TA
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Chun-Yu Tseng
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Databricks
 
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Amazon Web Services
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesStratio
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed REnd-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed RJorge Martinez de Salinas
 

Similar to Scaling graph investigations with Math, GPUs, & Experts (20)

Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetGraph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoT
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
 
NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...
NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...
NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...
 
DSDT Meetup January 2018
DSDT Meetup January 2018DSDT Meetup January 2018
DSDT Meetup January 2018
 
Dsdt meetup-january2018
Dsdt meetup-january2018Dsdt meetup-january2018
Dsdt meetup-january2018
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platforms
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
 
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph Datasources
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed REnd-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Scaling graph investigations with Math, GPUs, & Experts

  • 1. G R A P H I S T R Y info@graphistry.com G R A P H I S T R Y Scaling Visual Graph Investigations with Math, GPUs, and Experts GraphThePlanet, San Francisco, 2020 Leo Meyerovich, CEO @LMeyerov
  • 2. G R A P H I S T R Y info@graphistry.com Tech Security, anti-fraud, networking, … Analysts, devs, & researchers 100X Investigations: Graph, viz, GPUs, workflow acceleration Users
  • 3. G R A P H I S T R Y info@graphistry.com3 Graph the planet by solving logs • 1K – 1M devices • 1K – 1B users • All logged: Payments, logins, clicks, ... • Super rich metadata: IP, time, … • Stored in many independent DBs/APIs GRAPH • Scope • History & root cause • Impact • Patterns & outliers • …
  • 4. G R A P H I S T R Y info@graphistry.com Three scaling advances for graph-aware investigations Math Hypergraphs, virtual graphs, & ML-driven linking Compute GPUs for everyone! Experts Collaborative low-code automation G R A P H I S T R Y
  • 5. G R A P H I S T R Y info@graphistry.com IP=10.16.0.8; msg=Malware.Object; time=2 Nov 2017 19:32:00 UTC; vendor=FireEye; Product=Web MPS NX 5 Unify all data by modeling logs as graphs
  • 6. G R A P H I S T R Y info@graphistry.com Pick entity cols for nodes Linked when same Event event Fetch logs (ex: api result) Modeling 1/5: Map all logs as hypergraphs Simple UI: Column picker for any Splunk, Neo4j, etc. query result IP in 2 events event
  • 7. G R A P H I S T R Y info@graphistry.com Modeling 2/5: Look across all DBs/APIs with virtual graph queries 10.0.0.1 Alert Alerts DB (Splunk) 10.0.0.2 Accounts DB (SQL) 10.0.0.2 User2 Account Takeover (ZenDesk) LM LMeyer
  • 8. G R A P H I S T R Y info@graphistry.com Modeling 2/5: Look across all DBs/APIs with virtual graph queries 10.0.0.1 Alert 10.0.0.2 10.0.0.2 User2 search_splunk(x) LM LMeyer search_splunk(x) search_sql(x) search_sql(x) Alerts DB (Splunk) Accounts DB (SQL) Account Takeover (ZenDesk) Materialize on-demand: no actual graph DB!
  • 9. G R A P H I S T R Y info@graphistry.com Modeling 3/5: Queries are nasty, generate w/ UI + automation! Checks more data sources Tracks more clues In less time Generated query for 1 Splunk pivot call
  • 10. G R A P H I S T R Y info@graphistry.com Modeling 4/5: Graph algorithms to highlight events & entities Auto-clusters into 4 different behavioral groups Pumped accts & messages have high degree, high centrality Twitter-based mass phishing scam Alerts across IT perimeter User clusters inside company Smart layout splits out perimeter crossings
  • 11. G R A P H I S T R Y info@graphistry.com UMAP: ML likes dates, $, counts, … which graphs don’t… @leland_mcinnes
  • 12. G R A P H I S T R Y info@graphistry.com Modeling 5/5: … Use ML to infer neighbors & add them! Tensorflow+UMAP White: Link by k-nn on model Blue: Link entities as usual Regular graph analytics on merged graph
  • 13. G R A P H I S T R Y info@graphistry.com Three scaling advances for graph-aware investigations Math Hypergraphs, virtual graphs, & ML-driven linking Compute GPUs for everyone! Experts Collaborative low-code automation G R A P H I S T R Y
  • 14. G R A P H I S T R Y info@graphistry.com Scaling viz helps reveal correlations + work through dirty data
  • 15. G R A P H I S T R Y info@graphistry.com Client/Cloud CPU: Moore’s law is dead Client/Cloud GPU: Steady perf doublings & price drops 🤩 Flipping from “Graphistry is weird sci-fi” to “best & most affordable solution”
  • 16. G R A P H I S T R Y info@graphistry.com GPU Democratization 1/2 2014 Graphistry NSF: GPU Dataframes SBIR 2016/2017 Apache Arrow + Nvidia, BlazingSQL, … 2018/2019 RAPIDS: Databricks, Ursa, … Shared data format, GPU docker, … Graphistry first RAPIDS- native viz stack: it’s ready! GPU client <>GPU server: any browser!
  • 17. G R A P H I S T R Y info@graphistry.com G R A P H I S T R Y Graphistry Cloud: Get an account and go! • Open graph data network: free! • Developer embedding API • Data scientist notebook API • (AWS Price drop: 5X!) Rest of 2020: Explore more things & more easily! GPU Democratization 2/2
  • 18. G R A P H I S T R Y info@graphistry.com Three scaling advances for graph-aware investigations Math Hypergraphs, virtual graphs, & ML-inferred edges Compute GPUs for everyone! Experts Collaborative low-code automation G R A P H I S T R Y
  • 19. G R A P H I S T R Y info@graphistry.com Putting the Team into Blue Team: Collaboration tech Share Configs Data schemas generated and shared across community: “AWS logs settings” Automate without the Python & Docker • Enable regular analysts to automate their investigations via record & replay • ... => build up team arsenal to cover all data types and all investigation types Integrate with other investigation tools Embed viz into others apps  launch investigation templates from them (ex: User 360)  jump from event/entity to original tool / query (ex: Splunk) Explore
  • 20. G R A P H I S T R Y info@graphistry.com G R A P H I S T R Y Graphistry Cloud: Get an account and go! • Open graph data network: free! • Developer embedding API • Data scientist notebook API Thanks! info@graphistry.com
  • 21. G R A P H I S T R Y info@graphistry.com backup
  • 22. G R A P H I S T R Y info@graphistry.com Management perspective: 80/20 rule for covering functional KPIs 80% of DATA endpoint logs & alerts user logs & alerts server logs & alerts network logs & alerts service logs & alerts ticket APIs … 80% of INCIDENTS malware phishing cloud tenant breach app server takeover device theft offboarding … 80% of TASKS high-fidelity quick check investigative deep dive mitigation/containment/report table top training automation ... Overdue to make investigation structured & predictable! • Incident SLA • Investigation depth (burnout!) • Satellite team methodology • …
  • 23. G R A P H I S T R Y info@graphistry.com Collective automation: Record-and-replay investigation templates! 2. Auto-expand virtual graph
  • 24. G R A P H I S T R Y info@graphistry.com GPUs unlocking fast data @ scale for every step of your data pipeline 24 1 GPU w/ 1+ GB RAM Dedicated 16+ GPU per node w/ 500GB+ RAM Shared 1+ MB/s Big & fast data pushdown Database STREAMING WEBGL GRAPHICS OPTIMIZED NETWORKING GRAPH & TABULAR ANALYTICS © 2018 Graphistry, Inc. All rights reserved. Confidential and proprietary information. Do not distribute. info@graphistry.com
  • 25. G R A P H I S T R Y info@graphistry.com Graph reveal non-local stats on connected data (= all digital logs!) © 2018 Graphistry, Inc. All rights reserved. Confidential and proprietary information. Do not distribute. info@graphistry.com | 25 Scoping Patterns & Outliers Influence & Critical Players Progression & Behavior
  • 26. G R A P H I S T R Y info@graphistry.com RAPIDS UMAP layout Tensorflow categorization Graphistry visual analytics Splunk data lake regular review potential illicit activity potential trafficking 41K Reviews => 400 flagged
  • 27. G R A P H I S T R Y info@graphistry.com Graph: Top 5 most suspicious co’s, their records, and hits on their metadata Explainable & key entities *pop* Graph for correlating entities across events
  • 28. G R A P H I S T R Y info@graphistry.com Correlated macro view better than disconnected alerts & tickets! DEMO: 1w of FireEye HX over 546 IPs & 22 users
  • 29. G R A P H I S T R Y info@graphistry.com Quickly popping insights Color by time, data source Expand 2 hops Expand by community Color by rank, btwness, … Visual data cleaning Model tuning
  • 30. G R A P H I S T R Y info@graphistry.com 100X Compute: GPUs for everyone What if we could easily compute over full datasets in subsecond?
  • 31. G R A P H I S T R Y info@graphistry.com Hunting: Finally possible to do 1M+ events/entities w/ web UIs! Ex: Bro/Zeek (secrepo.com)
  • 32. G R A P H I S T R Y info@graphistry.com Faster Speeds, Real-World Benefits cuIO/cuDF – Load and Data Preparation cuML - XGBoost Time in seconds (shorter is better) cuIO/cuDF (Load and Data Prep) Data Conversion XGBoost Benchmark 200GB CSV dataset; Data prep includes joins, variable transformations CPU Cluster Configuration CPU nodes (61 GiB memory, 8 vCPUs, 64- bit platform), Apache Spark DGX Cluster Configuration 5x DGX-1 on InfiniBand network 8762 6148 3925 3221 322 213 End-to-End my_gdf.groupby([‘src_ip’,’dest_ip’])[‘time’].plot()
  • 33. G R A P H I S T R Y info@graphistry.com cuGraph Multi-GPU PageRank Performance PageRank portion of the HiBench benchmark suite HiBench Scale Vertices Edges CSV File (GB) # of GPUs PageRank for 3 Iterations (secs) Huge 5,000,000 198,000,000 3 1 1.1 BigData 50,000,000 1,980,000,000 34 3 5.1 BigData x2 100,000,000 4,000,000,000 69 6 9.0 BigData x4 200,000,000 8,000,000,000 146 12 18.2 BigData x8 400,000,000 16,000,000,000 300 16 31.8 Graph().add_edges(my_df).pagerank()
  • 34. G R A P H I S T R Y info@graphistry.com graph = netflow_df.sql(“““ SELECT sum(bytes), min(time), max(time) GROUP BY src_ip, dest_ip ”””) graphistry.plot(graph) BlazingSQL’s C++ skips cuDF’s Python Numba JIT… so _great_ for subsecond interactivity!
  • 35. G R A P H I S T R Y info@graphistry.com Closing remarks: Scaling graph _projects_ Avoid failure to launch by avoiding infra & NIH: 1d-1mo: Cloud, viz, on-the-fly compute, notebooks, API connectors 3mo-never: Graph DB, Kafka ingest, Hadoop, on-prem, custom analytics, custom UIs Useful by design: Make user+problem #1 driver, not infra Win ROI politics w/ cupcake principle: Big projects start as small projects Lower switching costs by augmenting vs. replacing Everyone used to status quo and uninterested in avoidable work.. Start w/ good champions: Ideally innovative, influential, technical, & has time grow from there Gartner: “85% of data science projects fail.”