SlideShare a Scribd company logo
Scaling Security Threat Detection
with Spark and Databricks
Josh Gillner
Apple Detection Engineering
▪ Protecting Apple’s Systems
▪ Finding & responding to security
threats using log data
▪ Threat research and hunting
^^^ Looking for this guy
Who are we? - Apple Detection Engineering
Which Technologies?
Alert Orchestration
System
CI System
What is a detection?
Detection === Code That Finds Bad Stuff
▪ Get an input dataset(s)
▪ Apply logic
▪ Output
▪ 1 notebook -> 1 job -> 1 detection
What Happens Next?
Analyst Review
Suggestion, Enrichment &
Automated Containment
Alert Orchestration System
Contain
Issue+
This needs to be fast
Standardizing Detections
Problem #1 — Development Overhead
▪ Average time to write, test, and deploy a
basic detection === 1 week
▪ New ideas/week > deployed jobs/week
(unsustainable)
▪ Writing scalatests, preserving test
samples…testing is too cumbersome
▪ > 60% of new code is boilerplate (!!)
Problem #2 — Mo’ Detections, Mo’ Problems
Want to add a cool new
feature to all detections?
Refactor many different
notebooks
Config all over the place in
disparate notebooks
Want to configure multiple
detections at once?
Ongoing tuning and
maintenance?
One-off tuning doesn’t scale
to hundreds of detections
Problem #3 — No Support for Common Patterns
▪ Common enrichments or exclusions
▪ Creating and using statistical
baselines
▪ Write detection test using scalatest
Things People Often Do
(but must write code for)
…everyone implements in
a different way
…fixes/updates must be
applied in 10 places
DetectionKit
Auto-Tuning Alerts
Modular Postprocessing
Automated Enrichments
Complex Exclusions
Notebook-Based CI
Centralized Configuration
Test Generation
Alert Standardization
Future-Looking Abstraction
Multi-Stage Alerting
Modular Investigation Templates
Signal-Based Detections
Rate Limit Failsafes
Preprocessing Transformations
Automated Tagging
Statistics Tables
Entity-Based Deduplication
Asset Attribution
Components
▪ Input
▪ Detection and Alert abstractions
▪ Emitters
▪ Configuration
▪ Tuning
▪ Modular Pre/Postprocessing
▪ Functional Testing
▪ Complex Exclusions
▪ Templatized Investigations
Input
▪ All detection begins with input loading
▪ Pass in inputs through config object
▪ External control through config
▪ decide spark.read vs .readStream
▪ path, schema, format
▪ no hardcoding -> dynamic input
behavior
▪ Abstracts away details of getting data
^^^ This should not change if
someDataset is a production table
or test sample file
Detection and Alert Abstraction
▪ Logic is described
in form of Spark
DataFrame
▪ Supports additional
post-processing
transformation
▪ Basic interface for
consumption by
other code
Detection
val alerts: Map[String, Alert] =
Alert
val modules: ArrayBuffer[Transformer] =
def PostProcessor(input: DataFrame): DataFrame = ???
def df: DataFrame = /* alert logic here */
val config: DetectionConfig
Input and other runtime configs
Test generation
Emitter
▪ Takes output from Alert and send them elsewhere
▪ Also schedules the job in Spark cluster
Alert
MemoryEmitter
FileEmitter
KinesisEmitter
DBFS on AWS S3
In-memory Table
AWS Kinesis
Config Inference
▪ If things can (and should) be changed, move it outside of code
▪ eg. detection name, description, input dataset, emitter
▪ Where possible, supply a sane default or infer them
val checkpointLocation: String =
"dbfs:/mnt/defaultbucket/chk/detection/ / / .chk/"
name = "CodeRed: Something Has Happened"
alertName = "JoshsCoolDetection"
version = "1"
DetectionConfigInfer
Config Inheritance
▪ Fine-grained configurability
▪ Could be multiple Alerts in
same Detection
▪ Individually configurable,
otherwise inherit parent
config
Detection
Alert
val config: DetectionConfig
Alert
Alert
Modular Pre/PostProcessing
▪ DataFrame -> DataFrame transform
applied to input dataset
▪ Supplied in config
▪ Useful for things like date filtering
without changing detection
Preprocessing
Postprocessing
▪ Mutable Seq of transform functions
inside Detection
▪ Applied sequentially to output
foreachBatch Transformers
▪ Some operations not stream-safe
▪ Where the crazy stuff happens
Manual Tuning Lifecycle
▪ Tuning overhead scales
with number of detections
▪ Feedback loop can take
days while analysts
suffer :(
▪ This need to be faster…
ideally automated and self-
service
The data/
environment
changes
DE tweaks
detection
False positive
alerts
Analyst
requests
tuning pain
Self-Tuning Alerts
Detection
Analyst Review
Alert
Labels
(FP, TP, etc.)
Analyst
Consensus!
Modify Behavior
Alert
Orchestration
System
Complex Exclusions
▪ Arbitrary SQL expressions applied
on all results in forEachBatch
▪ Stored in rev-controlled TSV
▪ Integrated into Detection Test
CI…malformed or over-selective
items will fail tests
▪ Preservation of excluded alerts in
a separate table
Eventually, detections look like this >>>
So….
Repetitive Investigations…What Happens?
• Analysts run queries
in notebooks to
investigate
• Most of these
queries look the
same, just different
filter
Analyst Review
Alert Orchestration System
Automated Investigation Templates
▪ Find corresponding
template notebook
▪ Fill it out
▪ Attach to cluster
▪ Execute
Alert Orchestration
System
Workspace API
This lets us automate useful things like…
Interactive Process Trees in D3 Baselines of Typical Activity
Automated Containment
Machines can find, investigate, and contain issues without humans
Automated Investigation
Alert Orchestration System
ODBC API
• Run substantiating
queries via ODBC
• Render verdict
Contain
Issue
Detection Testing
Why is it so painful?
▪ Preserving/exporting JSON
samples
▪ Local SparkSession isn’t a real
cluster
▪ Development happens in
notebooks, testing happens in
IDE
▪ Brittle to even small changes
to schema, etc
Detection Functional Tests
▪ 85% reduction in test LoC
▪ write and run tests in
notebooks!
▪ use Delta sample files in
dbfs, no more exporting
JSON
▪ scalatest generation using
config and convention
Trait: DetectionTest
^^ this is a complete test ^^
Detection Test CI
Git PR
CI System
Test
Notebooks
Workspace API
/Alerts/Test/PRs/<Git PR
number>_<Git commit
hash>
Jobs API
Build
Scripts pass/fail
“Testing has never been this fun!!”
— detection engineers, probably
Jobs CI — Why?
▪ Managing hundreds of jobs in Databricks UI
▪ Each job has associated notebook, config, dbfs files
▪ No inventory of which jobs should be running, where
▪ We need job linting >>>
Databricks Stacks!
Deploy/Reconfigure Jobs with Single PR
CI System
Config Linter
Stacks CLI
Jobs Helper
Deploy Job/
Notebooks/Files
Kickstart/Restart
Set Permissions
Cool Things with Jobs CI!
▪ Deploy or reconfigure many
jobs concurrently
▪ Auto job restarts on notebook/
config change
▪ Standardization of retries,
timeout, permissions
▪ Automate alarm creation for
new jobs
^^^ No one likes manually crafting
Stacks JSON — so we generate it
Saving Time with
Automated Historical Insight
Problem #1 — Cyclical Investigations
▪ Alert comes in, analysts spend hours
looking into it
▪ But the same thing happened 3
months ago and was determined to be
benign
▪ Lots of wasted cycles on duplicative
investigations
Problem #2 — Disparate Context
▪ Want to find historical incident
data?
▪ look in many different places
▪ many search UIs, syntaxes
▪ Manual, slow & painful
▪ New analysts won’t have
historical knowledge
Problem #3 — Finding Patterns
Which incidents relate to other
incidents?
Do we see common infrastructure,
actors?
How much work is repeated?
Case #55557
Case #44447
Case #33337
}(Some IP Address)
Solution: Document Recommendations
▪ Collect all incident-related
tickets, correspondence, and
investigations
▪ Normalize them into a Delta
table
▪ Automate suggestion of
related knowledge using our
own corpus of documents
Emails
Tickets
Alerts
Notebooks
Detection Code
Wikis
“Has This Happened Before?” -> Automated
Includes analyst comments and
verdicts
displayHTML suggestions,
clickable links to original document
Automated Suggestions
alert_hash 112233445
serial = C12345678ABC
src_ip = 88.88.88.123
mime_type = [“text/html"]
dt = 2020-03-15
C12345678ABC
88.88.88.123
Entities
C12345678ABC
88.88.88.123
00:de:ad:00:be:ef
01:8b:ad:00:f0:0d
joshsaccount
joshshostname
Enriched Entities
{
Emails
Tickets
Alerts
Notebooks
Detections
Wikis
Document Search
Suggestion
Anatomy of an Alert
These are not valuable for search! (too
common)
These are good indicators of document
relevance
Entity Tokenization and Enrichment
IP Address
Regex
Domain
Hashes
Accounts
Serials
UDIDs
File Path
Emails
MAC Addresses
Alert Payload
VPN Sessions
Enrichments
DHCP Sessions
Asset Data
Account Data
Suggestion Algorithm
▪ Gather match statistics for each
entity:
▪ historical rarity
▪ document count rarity
▪ doc type distribution
▪ Compute entity weight based on
average ranked percentiles of those
features
▪ More common terms == less
valuable
▪ Return the best n hits by confidence
▪ Not That Expensive™
Q & A ?
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.

More Related Content

What's hot

Windows Registry Forensics with Volatility Framework
Windows Registry Forensics with Volatility FrameworkWindows Registry Forensics with Volatility Framework
Windows Registry Forensics with Volatility Framework
Kapil Soni
 
Threat Hunting with Splunk
Threat Hunting with SplunkThreat Hunting with Splunk
Threat Hunting with Splunk
Splunk
 
Owasp A9 USING KNOWN VULNERABLE COMPONENTS IT 6873 presentation
Owasp A9 USING KNOWN VULNERABLE COMPONENTS   IT 6873 presentationOwasp A9 USING KNOWN VULNERABLE COMPONENTS   IT 6873 presentation
Owasp A9 USING KNOWN VULNERABLE COMPONENTS IT 6873 presentation
Derrick Hunter
 
OWASP Top 10 A4 – Insecure Direct Object Reference
OWASP Top 10 A4 – Insecure Direct Object ReferenceOWASP Top 10 A4 – Insecure Direct Object Reference
OWASP Top 10 A4 – Insecure Direct Object Reference
Narudom Roongsiriwong, CISSP
 
OWASP-Web-Security-testing-4.2
OWASP-Web-Security-testing-4.2OWASP-Web-Security-testing-4.2
OWASP-Web-Security-testing-4.2
Massimo Talia
 
Kheirkhabarov24052017_phdays7
Kheirkhabarov24052017_phdays7Kheirkhabarov24052017_phdays7
Kheirkhabarov24052017_phdays7
Teymur Kheirkhabarov
 
Insecure direct object reference (null delhi meet)
Insecure direct object reference (null delhi meet)Insecure direct object reference (null delhi meet)
Insecure direct object reference (null delhi meet)
Abhinav Mishra
 
Development of an Automated Faculty Loading, Room Utilization, Subject ...
Development of an  Automated Faculty Loading,  Room Utilization, Subject     ...Development of an  Automated Faculty Loading,  Room Utilization, Subject     ...
Development of an Automated Faculty Loading, Room Utilization, Subject ...
Dr. Rosemarie Sibbaluca-Guirre
 
Password craking techniques
Password craking techniques Password craking techniques
Password craking techniques
أحلام انصارى
 
Operating Systems: Computer Security
Operating Systems: Computer SecurityOperating Systems: Computer Security
Operating Systems: Computer Security
Damian T. Gordon
 
Top 10 Web Security Vulnerabilities (OWASP Top 10)
Top 10 Web Security Vulnerabilities (OWASP Top 10)Top 10 Web Security Vulnerabilities (OWASP Top 10)
Top 10 Web Security Vulnerabilities (OWASP Top 10)
Brian Huff
 
Hunting for APT in network logs workshop presentation
Hunting for APT in network logs workshop presentationHunting for APT in network logs workshop presentation
Hunting for APT in network logs workshop presentation
OlehLevytskyi1
 
Threat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-onThreat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-on
Splunk
 
5 BEST PRACTICES FOR A SECURITY OPERATION CENTER (SOC)
5 BEST PRACTICES FOR A SECURITY OPERATION CENTER (SOC)5 BEST PRACTICES FOR A SECURITY OPERATION CENTER (SOC)
5 BEST PRACTICES FOR A SECURITY OPERATION CENTER (SOC)
Vijilan IT Security solutions
 
Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session
Splunk
 
FUZZING & SOFTWARE SECURITY TESTING
FUZZING & SOFTWARE SECURITY TESTINGFUZZING & SOFTWARE SECURITY TESTING
FUZZING & SOFTWARE SECURITY TESTING
MuH4f1Z
 
Offzone | Another waf bypass
Offzone | Another waf bypassOffzone | Another waf bypass
Offzone | Another waf bypass
Дмитрий Бумов
 
OWASP Top 10 - 2017
OWASP Top 10 - 2017OWASP Top 10 - 2017
OWASP Top 10 - 2017
HackerOne
 
Visualization for Security
Visualization for SecurityVisualization for Security
Visualization for Security
Raffael Marty
 
SNMP and splunk
SNMP and splunkSNMP and splunk
SNMP and splunk
Ashley Hartge
 

What's hot (20)

Windows Registry Forensics with Volatility Framework
Windows Registry Forensics with Volatility FrameworkWindows Registry Forensics with Volatility Framework
Windows Registry Forensics with Volatility Framework
 
Threat Hunting with Splunk
Threat Hunting with SplunkThreat Hunting with Splunk
Threat Hunting with Splunk
 
Owasp A9 USING KNOWN VULNERABLE COMPONENTS IT 6873 presentation
Owasp A9 USING KNOWN VULNERABLE COMPONENTS   IT 6873 presentationOwasp A9 USING KNOWN VULNERABLE COMPONENTS   IT 6873 presentation
Owasp A9 USING KNOWN VULNERABLE COMPONENTS IT 6873 presentation
 
OWASP Top 10 A4 – Insecure Direct Object Reference
OWASP Top 10 A4 – Insecure Direct Object ReferenceOWASP Top 10 A4 – Insecure Direct Object Reference
OWASP Top 10 A4 – Insecure Direct Object Reference
 
OWASP-Web-Security-testing-4.2
OWASP-Web-Security-testing-4.2OWASP-Web-Security-testing-4.2
OWASP-Web-Security-testing-4.2
 
Kheirkhabarov24052017_phdays7
Kheirkhabarov24052017_phdays7Kheirkhabarov24052017_phdays7
Kheirkhabarov24052017_phdays7
 
Insecure direct object reference (null delhi meet)
Insecure direct object reference (null delhi meet)Insecure direct object reference (null delhi meet)
Insecure direct object reference (null delhi meet)
 
Development of an Automated Faculty Loading, Room Utilization, Subject ...
Development of an  Automated Faculty Loading,  Room Utilization, Subject     ...Development of an  Automated Faculty Loading,  Room Utilization, Subject     ...
Development of an Automated Faculty Loading, Room Utilization, Subject ...
 
Password craking techniques
Password craking techniques Password craking techniques
Password craking techniques
 
Operating Systems: Computer Security
Operating Systems: Computer SecurityOperating Systems: Computer Security
Operating Systems: Computer Security
 
Top 10 Web Security Vulnerabilities (OWASP Top 10)
Top 10 Web Security Vulnerabilities (OWASP Top 10)Top 10 Web Security Vulnerabilities (OWASP Top 10)
Top 10 Web Security Vulnerabilities (OWASP Top 10)
 
Hunting for APT in network logs workshop presentation
Hunting for APT in network logs workshop presentationHunting for APT in network logs workshop presentation
Hunting for APT in network logs workshop presentation
 
Threat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-onThreat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-on
 
5 BEST PRACTICES FOR A SECURITY OPERATION CENTER (SOC)
5 BEST PRACTICES FOR A SECURITY OPERATION CENTER (SOC)5 BEST PRACTICES FOR A SECURITY OPERATION CENTER (SOC)
5 BEST PRACTICES FOR A SECURITY OPERATION CENTER (SOC)
 
Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session
 
FUZZING & SOFTWARE SECURITY TESTING
FUZZING & SOFTWARE SECURITY TESTINGFUZZING & SOFTWARE SECURITY TESTING
FUZZING & SOFTWARE SECURITY TESTING
 
Offzone | Another waf bypass
Offzone | Another waf bypassOffzone | Another waf bypass
Offzone | Another waf bypass
 
OWASP Top 10 - 2017
OWASP Top 10 - 2017OWASP Top 10 - 2017
OWASP Top 10 - 2017
 
Visualization for Security
Visualization for SecurityVisualization for Security
Visualization for Security
 
SNMP and splunk
SNMP and splunkSNMP and splunk
SNMP and splunk
 

Similar to Scaling Security Threat Detection with Apache Spark and Databricks

Testing 101
Testing 101Testing 101
Testing 101
Noam Barkai
 
SELJE_Database_Unit_Testing_Slides.pdf
SELJE_Database_Unit_Testing_Slides.pdfSELJE_Database_Unit_Testing_Slides.pdf
SELJE_Database_Unit_Testing_Slides.pdf
Eric Selje
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience Report
Netcetera
 
Property based testing - Less is more
Property based testing - Less is moreProperty based testing - Less is more
Property based testing - Less is more
Ho Tien VU
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Databricks
 
Chelberg ptcuser 2010
Chelberg ptcuser 2010Chelberg ptcuser 2010
Chelberg ptcuser 2010
Clay Helberg
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
CODE BLUE
 
Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015
lpgauth
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
MumitAhmed1
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
SharabiNaif
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
Anonymous9etQKwW
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
rschuppe
 
Illuminate - Performance Analystics driven by Machine Learning
Illuminate - Performance Analystics driven by Machine LearningIlluminate - Performance Analystics driven by Machine Learning
Illuminate - Performance Analystics driven by Machine Learning
jClarity
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsTracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Databricks
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsDEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Sriskandarajah Suhothayan
 
DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...
DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...
DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...
Paris Salesforce Developer Group
 
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
BIOVIA
 

Similar to Scaling Security Threat Detection with Apache Spark and Databricks (20)

Testing 101
Testing 101Testing 101
Testing 101
 
SELJE_Database_Unit_Testing_Slides.pdf
SELJE_Database_Unit_Testing_Slides.pdfSELJE_Database_Unit_Testing_Slides.pdf
SELJE_Database_Unit_Testing_Slides.pdf
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience Report
 
Property based testing - Less is more
Property based testing - Less is moreProperty based testing - Less is more
Property based testing - Less is more
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Chelberg ptcuser 2010
Chelberg ptcuser 2010Chelberg ptcuser 2010
Chelberg ptcuser 2010
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
 
Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Illuminate - Performance Analystics driven by Machine Learning
Illuminate - Performance Analystics driven by Machine LearningIlluminate - Performance Analystics driven by Machine Learning
Illuminate - Performance Analystics driven by Machine Learning
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsTracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsDEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
 
DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...
DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...
DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...
 
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 

Recently uploaded (20)

writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 

Scaling Security Threat Detection with Apache Spark and Databricks

  • 1.
  • 2. Scaling Security Threat Detection with Spark and Databricks Josh Gillner Apple Detection Engineering
  • 3. ▪ Protecting Apple’s Systems ▪ Finding & responding to security threats using log data ▪ Threat research and hunting ^^^ Looking for this guy Who are we? - Apple Detection Engineering
  • 5. What is a detection?
  • 6. Detection === Code That Finds Bad Stuff ▪ Get an input dataset(s) ▪ Apply logic ▪ Output ▪ 1 notebook -> 1 job -> 1 detection
  • 7. What Happens Next? Analyst Review Suggestion, Enrichment & Automated Containment Alert Orchestration System Contain Issue+ This needs to be fast
  • 9. Problem #1 — Development Overhead ▪ Average time to write, test, and deploy a basic detection === 1 week ▪ New ideas/week > deployed jobs/week (unsustainable) ▪ Writing scalatests, preserving test samples…testing is too cumbersome ▪ > 60% of new code is boilerplate (!!)
  • 10. Problem #2 — Mo’ Detections, Mo’ Problems Want to add a cool new feature to all detections? Refactor many different notebooks Config all over the place in disparate notebooks Want to configure multiple detections at once? Ongoing tuning and maintenance? One-off tuning doesn’t scale to hundreds of detections
  • 11. Problem #3 — No Support for Common Patterns ▪ Common enrichments or exclusions ▪ Creating and using statistical baselines ▪ Write detection test using scalatest Things People Often Do (but must write code for) …everyone implements in a different way …fixes/updates must be applied in 10 places
  • 12. DetectionKit Auto-Tuning Alerts Modular Postprocessing Automated Enrichments Complex Exclusions Notebook-Based CI Centralized Configuration Test Generation Alert Standardization Future-Looking Abstraction Multi-Stage Alerting Modular Investigation Templates Signal-Based Detections Rate Limit Failsafes Preprocessing Transformations Automated Tagging Statistics Tables Entity-Based Deduplication Asset Attribution
  • 13. Components ▪ Input ▪ Detection and Alert abstractions ▪ Emitters ▪ Configuration ▪ Tuning ▪ Modular Pre/Postprocessing ▪ Functional Testing ▪ Complex Exclusions ▪ Templatized Investigations
  • 14. Input ▪ All detection begins with input loading ▪ Pass in inputs through config object ▪ External control through config ▪ decide spark.read vs .readStream ▪ path, schema, format ▪ no hardcoding -> dynamic input behavior ▪ Abstracts away details of getting data ^^^ This should not change if someDataset is a production table or test sample file
  • 15. Detection and Alert Abstraction ▪ Logic is described in form of Spark DataFrame ▪ Supports additional post-processing transformation ▪ Basic interface for consumption by other code Detection val alerts: Map[String, Alert] = Alert val modules: ArrayBuffer[Transformer] = def PostProcessor(input: DataFrame): DataFrame = ??? def df: DataFrame = /* alert logic here */ val config: DetectionConfig Input and other runtime configs Test generation
  • 16. Emitter ▪ Takes output from Alert and send them elsewhere ▪ Also schedules the job in Spark cluster Alert MemoryEmitter FileEmitter KinesisEmitter DBFS on AWS S3 In-memory Table AWS Kinesis
  • 17. Config Inference ▪ If things can (and should) be changed, move it outside of code ▪ eg. detection name, description, input dataset, emitter ▪ Where possible, supply a sane default or infer them val checkpointLocation: String = "dbfs:/mnt/defaultbucket/chk/detection/ / / .chk/" name = "CodeRed: Something Has Happened" alertName = "JoshsCoolDetection" version = "1" DetectionConfigInfer
  • 18. Config Inheritance ▪ Fine-grained configurability ▪ Could be multiple Alerts in same Detection ▪ Individually configurable, otherwise inherit parent config Detection Alert val config: DetectionConfig Alert Alert
  • 19. Modular Pre/PostProcessing ▪ DataFrame -> DataFrame transform applied to input dataset ▪ Supplied in config ▪ Useful for things like date filtering without changing detection Preprocessing Postprocessing ▪ Mutable Seq of transform functions inside Detection ▪ Applied sequentially to output foreachBatch Transformers ▪ Some operations not stream-safe ▪ Where the crazy stuff happens
  • 20. Manual Tuning Lifecycle ▪ Tuning overhead scales with number of detections ▪ Feedback loop can take days while analysts suffer :( ▪ This need to be faster… ideally automated and self- service The data/ environment changes DE tweaks detection False positive alerts Analyst requests tuning pain
  • 21. Self-Tuning Alerts Detection Analyst Review Alert Labels (FP, TP, etc.) Analyst Consensus! Modify Behavior Alert Orchestration System
  • 22. Complex Exclusions ▪ Arbitrary SQL expressions applied on all results in forEachBatch ▪ Stored in rev-controlled TSV ▪ Integrated into Detection Test CI…malformed or over-selective items will fail tests ▪ Preservation of excluded alerts in a separate table Eventually, detections look like this >>> So….
  • 23. Repetitive Investigations…What Happens? • Analysts run queries in notebooks to investigate • Most of these queries look the same, just different filter Analyst Review Alert Orchestration System
  • 24. Automated Investigation Templates ▪ Find corresponding template notebook ▪ Fill it out ▪ Attach to cluster ▪ Execute Alert Orchestration System Workspace API
  • 25. This lets us automate useful things like… Interactive Process Trees in D3 Baselines of Typical Activity
  • 26. Automated Containment Machines can find, investigate, and contain issues without humans Automated Investigation Alert Orchestration System ODBC API • Run substantiating queries via ODBC • Render verdict Contain Issue
  • 27. Detection Testing Why is it so painful? ▪ Preserving/exporting JSON samples ▪ Local SparkSession isn’t a real cluster ▪ Development happens in notebooks, testing happens in IDE ▪ Brittle to even small changes to schema, etc
  • 28. Detection Functional Tests ▪ 85% reduction in test LoC ▪ write and run tests in notebooks! ▪ use Delta sample files in dbfs, no more exporting JSON ▪ scalatest generation using config and convention Trait: DetectionTest ^^ this is a complete test ^^
  • 29. Detection Test CI Git PR CI System Test Notebooks Workspace API /Alerts/Test/PRs/<Git PR number>_<Git commit hash> Jobs API Build Scripts pass/fail “Testing has never been this fun!!” — detection engineers, probably
  • 30. Jobs CI — Why? ▪ Managing hundreds of jobs in Databricks UI ▪ Each job has associated notebook, config, dbfs files ▪ No inventory of which jobs should be running, where ▪ We need job linting >>>
  • 32. Deploy/Reconfigure Jobs with Single PR CI System Config Linter Stacks CLI Jobs Helper Deploy Job/ Notebooks/Files Kickstart/Restart Set Permissions
  • 33. Cool Things with Jobs CI! ▪ Deploy or reconfigure many jobs concurrently ▪ Auto job restarts on notebook/ config change ▪ Standardization of retries, timeout, permissions ▪ Automate alarm creation for new jobs ^^^ No one likes manually crafting Stacks JSON — so we generate it
  • 34. Saving Time with Automated Historical Insight
  • 35. Problem #1 — Cyclical Investigations ▪ Alert comes in, analysts spend hours looking into it ▪ But the same thing happened 3 months ago and was determined to be benign ▪ Lots of wasted cycles on duplicative investigations
  • 36. Problem #2 — Disparate Context ▪ Want to find historical incident data? ▪ look in many different places ▪ many search UIs, syntaxes ▪ Manual, slow & painful ▪ New analysts won’t have historical knowledge
  • 37. Problem #3 — Finding Patterns Which incidents relate to other incidents? Do we see common infrastructure, actors? How much work is repeated? Case #55557 Case #44447 Case #33337 }(Some IP Address)
  • 38. Solution: Document Recommendations ▪ Collect all incident-related tickets, correspondence, and investigations ▪ Normalize them into a Delta table ▪ Automate suggestion of related knowledge using our own corpus of documents Emails Tickets Alerts Notebooks Detection Code Wikis
  • 39. “Has This Happened Before?” -> Automated Includes analyst comments and verdicts displayHTML suggestions, clickable links to original document
  • 40. Automated Suggestions alert_hash 112233445 serial = C12345678ABC src_ip = 88.88.88.123 mime_type = [“text/html"] dt = 2020-03-15 C12345678ABC 88.88.88.123 Entities C12345678ABC 88.88.88.123 00:de:ad:00:be:ef 01:8b:ad:00:f0:0d joshsaccount joshshostname Enriched Entities { Emails Tickets Alerts Notebooks Detections Wikis Document Search Suggestion
  • 41. Anatomy of an Alert These are not valuable for search! (too common) These are good indicators of document relevance
  • 42. Entity Tokenization and Enrichment IP Address Regex Domain Hashes Accounts Serials UDIDs File Path Emails MAC Addresses Alert Payload VPN Sessions Enrichments DHCP Sessions Asset Data Account Data
  • 43. Suggestion Algorithm ▪ Gather match statistics for each entity: ▪ historical rarity ▪ document count rarity ▪ doc type distribution ▪ Compute entity weight based on average ranked percentiles of those features ▪ More common terms == less valuable ▪ Return the best n hits by confidence ▪ Not That Expensive™
  • 44. Q & A ?
  • 45. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.