More Related Content

Slideshows for you(20)

Similar to Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity(20)


More from Databricks(20)

Recently uploaded(20)


Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity

  1. Unleashing your Security Practitioners with Data First Architectures Empower SIEMs Like Splunk with the Lakehouse for Cybersecurity
  2. ©HSBC Group 2021 George Webster Global Head of Science and Analytics Jason Trost Head of Analytic Engines Monzy Merza VP of Cybersecurity Go-To-Market • Focused on large-scale data analytics, the offense mindset, and arguing for budget • From the DoD, CIA, Academia, and financial services • Loves to cook and it shows • His picture is from a mandated professional headshot • Focused on developing capabilities for network security, DFIR, and security data science • From DoD, start-ups, and financial services • Strong meme game • No sane bouncer would even accept that picture • Focused on serving cybersecurity teams by demystifying cloud scale security analytics • From National Labs, Splunk • Loves green chili • He did the picture himself but it looks pretty good
  3. ©HSBC Group 2021 Legalieeeeeeeez The numbers are citable references from publically available sources, peer reviewed material and major publications believed to be reliable, but it has not been independently verified by HSBC or Databricks. The numbers are not from HSBC or Databricks. The examples and demonstrations are samples or illustrative representations. They are not original code from HSBC or Databricks We will be discussing concepts and patterns generally, these are not actual implementations from HSBC or Databricks. We are not vouching for or promoting particular tooling. We are using our experience of working with certain tools to explain patterns and principles that can be applied through various means and tools available on the market. Photograph is public domain. License info: All photos published on Unsplash are licensed under Creative Commons Zero which means you can copy, modify, distribute and use the photos for free, including commercial purposes, without asking permission from or providing attribution to the photographer or Unsplash. CREATIVE COMMONS ZERO:
  4. ©HSBC Group 2021 ~64 Countries $2.984 Trillion Total Assets HSBC is a multinational investment and financial services company. Founded in 1865 40Million Customers 226,000 Employees
  5. ©HSBC Group 2021 Cybersecurity Science and Analytics empowers Cybersecurity teams in protecting the bank by leveraging data and innovative capabilities to create effective and proactive security measures as well as enabling data driven business decisions The office’s primary objective is to empower our people, processes, and technology and enabling the analysts of the future
  6. ©HSBC Group 2021 HOURS VS DAYS 24 hours Average time for an attacker to move from victim A to victim B *Industry averages from peer reviewed sources 200days For defenders to detect malicious activity 54days to perform an investigation POST detection
  7. ©HSBC Group 2021 SIEMs Aren’t Setup for Success! 100+ security tools Data locked in vendor tools Marginal integrations SOC humans compensating for analytical deficiencies SIEM Patching Etc EDR AV AV Proxy SOC IDS Firewall Code Scans Endpoint Agents Email DLP IAM Etc EDR AV CMDB Inventories Intel …. Vuln Network
  8. ©HSBC Group 2021 Cybersecurity is a massive big data problem (Cost and Capability) 50-100 TB/day Endpoint Detection & Response logs 40-50 TB/day Network Sensors 20+ TB/day AWS VPC Flow 5-10 TB/day AWS Cloudtrail & Cloudtrail Data Events 100-200 TB/day Total Log Ingestion x 13 months = 38-79 PB Retention *Numbers are representative of a large enterprise network
  10. ©HSBC Group 2021 HSBC: Control the Data - Unlock your People Cost effective Data unlocked Enables Analytics Empowers People SIEM Capabilities Lakehouse Etc EDR AV Sensors ETL Etc EDR AV Enrichment Etc EDR AV Tools Crushing It
  11. ©HSBC Group 2021 Use Case Deep Dive
  12. ©HSBC Group 2021 Use-case: Threat Detection in DNS Data ● DNS logs (~10TB/day) ● Near real-time detection ● Use ML, Rules, and threat intel enrichments ● Send alerts to SIEM
  13. ©HSBC Group 2021 DNS Threat Detection Recipe Streaming Data - Passive DNS Enrichment sources - Threat intel feed - Geo IP - Look alike domains Detections - Domain Generation Algorithm (DGA) domains - Look alike domain name generation Deployment - Streaming Passive DNS S3 Spark Streaming. Model Scoring Malicious Activity Benign
  14. ©HSBC Group 2021 DNS Threat Detection in Near Time DNS Data 10 TB/ day AWS S3 Queries Results *Simplified view focused only on SIEM Ingest, ETL, normalize Store, Enrich, Optimize SQL Analytics Query, Report Classify, Alert Alert management ops workflow SIEM Alerts MB’s to GB’s/day Feedback
  15. ©HSBC Group 2021 Benefits of Approach Scale & Speed ● Process ~10TB of DNS logs/day ● Augment SIEM economically ● Leverage advanced analytics & ML ● Near real-time detection of DNS threats
  16. ©HSBC Group 2021 Large-scale Threat Hunting Sift through cybersecurity log data in order to find signs of malicious activity, both current and historical, that have evaded existing security defenses At Pace Explore large amounts of historic logs Correlate activity across log sources Leverage Analytics, Anomaly Detection, ML Repeatable, self documenting, Team oriented At Scale
  17. ©HSBC Group 2021 Hypothetical Threat Hunt Operation • A new mass supply chain attack is discovered and the details of this activity are made public in a government threat intelligence report. Many details are released including domain names, IP addresses, file hashes of malware, and detailed lists of tactics, techniques, and procedures (TTPs) observed, but the report claims the activity started one year ago. • Threat Hunt Objective – Is the adversary in our network now? – Was the adversary ever in our network? • Scope – Timeframe: 12 months
  18. ©HSBC Group 2021 How do we execute this Threat Hunt Operation? • The SIEM is where security data lives in most large enterprises • But 12 months of EDR and network logs is massive, likely several PB’s of data. • And most SIEMs are: – not designed for large and complex historical searching over petabytes of log data – don’t support Many-to-Many JOINs very well – don’t adequately support ML/AI use cases, esp at scale • We need a better tool ..
  19. ©HSBC Group 2021 Threat Hunting using Spark + Delta Lake Cheap cloud storage + Delta Lake for ingestion Cloud Logs Endpoint Logs Network Logs Extreme volume logs >100 TB/day Easily query and search historic data in Delta Lake Databricks Notebooks Threat Hunter develops Databricks Notebooks to codify the Hunting operation Queries Results Elasticity of Cloud compute
  20. ©HSBC Group 2021 Benefits of Approach Speed ● Perform advanced analytics at the pace and speed of the adversary ● Hunts are reusable and self documenting through Notebooks ● Anticipate that we can execute 2-3x more hunts per analyst because they are no longer bound by hardware Scale ● Handle processing all required data, >100TB/day ● Increase online queryable retention from days to many months and PB scale ● Anticipate that the scopes of the hunts can be much larger due to increased data retention
  21. ©HSBC Group 2021 Monzy
  22. ©HSBC Group 2021 Demo Time! • I am going to show you how – Multiple personas can use Databricks – You can download the notebook. • Demos coming up … – Detection via DNS in action! – DNS recipe code segments – Threat hunting in action! Pile of IOCs – Splunk integration - query and search results
  23. ©HSBC Group 2021 Splunk Integration
  24. ©HSBC Group 2021 Query Databricks from Splunk UI
  25. ©HSBC Group 2021
  26. ©HSBC Group 2021 Conclusion
  27. ©HSBC Group 2021 Key Takeaways • There is a time horizon gulf between attackers and defenders • Legacy SIEMs are not good for the Petabyte data world • Lakehouse architecture is transforming HSBC’s cyberdefense • These methods unlock your security teams and save your budget
  28. ©HSBC Group 2021 What’s next Check out the deep dive demos: - Detecting cyber criminals using Databricks - Multicloud security operations with Splunk + Databricks Schedule a hands on training : email Try the DNS Notebook Send me a note: Send HSBC a note: