Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Platform


Published on

An overview of Apache Metron, an open source platform for ingesting, enriching, triaging, and storing diverse cybersecurity feeds. Metron is built on top of hadoop and is horizontally scalable using commodity hardware.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Providence Future of Data Meetup - Apache Metron Open Source Cybersecurity Platform

  1. 1. Apache Metron Meetup Carolyn Duby Solutions Engineer @ Hortonworks Apache Metron Subject Matter Expert
  2. 2. Part 1 – Overview of Apache Metron • Challenges with Today’s Security Tools to Combat Cyber Attacks • Introduction to Apache Metron • Personas and Core Themes • Why Apache Metron? Part 2 – Metron Architecture • Telemetry Parsing • Enrichment • Threat Intelligence • Alert Triage • Index and Write to Storage • Getting Started Agenda
  3. 3. The Good Guys Security Practitioner I have too many tools I need to learn I don’t have a centralized view of my data My tools are too expensive I can’t find enough talent I can’t keep relying on static rules I need to discover bad stuff quicker Most of my alerts are false positives I have too many manual tasks SOC Manager Threat landscape too dynamic More assets/users to manage Attack surface increases Legacy techniques don’t work anymore Metron will make it easier and faster to find the real issues I need to act on Metron is a more cost effective way for my team to deal with the fast moving threat landscape
  4. 4. The Bad Guys Advanced Persistent Threat Script Kiddie My techniques are predictable and known My attack vectors are also known You are not the only person I’ve attacked I brag about what I did or will do I set off a large number of alerts I fumble around a lot I am very unique in a way I do things I live on your network for about 300 days I know what I am after and I look for it, slowly Your rules will not detect me, I am too smart I impersonate a legitimate user, but I don’t act like one Metron can take everything that is known about me and check for it in real time Metron can model historical behavior of whoever I am impersonating and flag me as I try to deviate
  5. 5. Problems With Existing Tools Security Information Management System I am prohibitively expensive I have vendor lock-in I can’t deal with big data I am not open I am not extensible enough Legacy Point Tools I was built for 1995 I am super specialized I don’t scale horizontally I have a proprietary format You need a PhD to operate me Behavioral Analytics Tools I am mostly vapor ware I was built by a small startup I was modeled after a data set from 1999 I spam you with false positives
  6. 6. Apache Metron Vision “Apache Metron is a Security Data Analytics Platform (SDAP). As a next generation security analytics framework, it is designed to consume and monitor network traffic and machine data within an enterprise. Apache Metron is extensible and is designed to work at a massive scale. It is not a SIEM but rather the next evolution of a SIEM.” Apache Metron provides the following capabilities:  Extensible ingest to monitor any telemetry source  Extensible enrichment framework for any telemetry stream  Hadoop-backed storage for telemetry stream with a customizable retention time for cost effective archive  Automated real-time index for telemetry streams enabling real-time search  Telemetry correlation and SQL query capability for data stored in Hadoop backed by Hive  ODBC/JDBC compatibility and integration with existing analytics tools
  7. 7. Use Case Setup • On 4/10, a user named Ethan V at Company Foo submits a security ticket complaining about a potential Phishing Email. • Details provided by the Ethan V in the ticket • The email states that a signature is required for a new Docu-Sign document for a new Stock Option grant for granted to Ethan from internal Finance employee Sonja Lar • There is a link in the email to the Docu-Sign Document • Ethan clicks on the link, and login appears • Ethan enters his SSO credentials and submits • On submission, nothing happens • Ethan calls Sonja but Sonja states she didn’t send an email • Ethan is worried and then files help desk security ticket • A security ticket is created and assigned to the SOC Team • A SOC analyst James picks up the case to investigate it.
  8. 8. Systems Accessed for Threat Scope Systems Accessed for Forensics Systems Accessed for Investigation/Context SIEM “Scope of Threat” Workflow Steps • Step 6: Searches SIEM for Fireye and IronPort email events associated with Sonja. The SIEM doesn’t have that info • Step 6 Result: Need to log into Fireye and IronPort • Step 7: Log into Fireye Email Threat Prevention Cloud & IronPort to find all emails sent from Sonja from that malicious IP • Step 7 Result: Have a list of all users that the Phishing email was sent to. Can reset the password for all those users Maxmind (IP Geo DB) AD (Identity Mgmt.) Asset Mgmt. Inventory Soltra (Threat Intel) Story Unfolding • Step 1 Insight: Anomalous Event – Corp Gmail was decommissioned on behalf of exchange months back and only few users are currently using it • Step 2 Insight: Not possible for the same user be logging in from Ireland & Southern Cali at the same time. • Step 3 Insight: Unauthorized access is occurring from Ireland • Step 4 Insight: Seems like Sonja is in Southern Cali but someone else pretending to be her is logging in from unidentified Asset • Step 5 Insight: Sonja’s account has been compromised. Shut it down and Ethan’s credentials have been reset. But what others users are affected like Ethan? • Step 6 Insight: SIEM doesn’t have all the fireye email events I need to determine scope • Step 7 Insight: Understand the scope of the threat and can can contain it. “Forensics” Workflow Steps • Step 8: Logs into Cisco IronPort to determine when the attacker first compromised Sonja’s Gmail account • Step 8 Result: On 3/26, a user from Ireleand logged into Sony’s Corp Gmail Account • Step 8 Insight: Understands when Sonja’s Gmail Account was first compromised • Step 9: Logs into Intermedia, an email archive system, to understand how the account was compromised • Step 9 Result: Sees a set of emails where the attacker spoofed someone else email address “warmed up’ her with a few emails and then sent an email with an link that Sonja clicked on which stole • Step 9 Insight: Understand how Sonja’s account got Systems Accessed for Remediation Exchange (Primary Email Service) Corp Gmail (Secondary Email Service) AD & SSO (Identity Provider & SSO) Searc h FireEye (Email Cloud Security ) Cisco IronPort (Email On-Premise Security ) Intermedia (Email Archive)
  9. 9. Do Investigation, Find Scope and Perform Forensics Using only Metron Systems Accessed for Remediation Exchange (Primary Email Service) Corp Gmail (Secondary Email Service) AD & OKTA (Identity Provider & SSO) Maxmind (IP Geo DB) AD (Identity Mgmt.) Asset Mgmt. Inventory Soltra (Threat Intel) Systems Accessed for Investigation/Context Systems Accessed to Determine Scope FireEye (Email Cloud Security ) Cisco IronPort (Email On-Premise Security ) Intermedia (Email Archive) Systems Accessed for Forensics
  10. 10. Challenges that Apache Metron Solves 60%: Percent of breaches that happened in minutes 8 months: Average time an advanced security breach goes unnoticed $400 million in estimated financial loss in 2015 70%-90%: Percentage of malware in breach unique to organization 2015 Verizon Data Breach Investigations Report • Too many manual steps in different tools makes investigations slow and expensive • Too expensive to keep data for enough time to understand history • Too expensive to collect all the desired data to understand context • Not sure if can detect a targeted event. • Too many events to review in timely manner • Not enough staff to review events in a timely manner • Too long to detect breach • Hackers getting more sophisticated
  11. 11. Why Metron? SOC Analyst Perspective Looking through alerts 25% Collecting contextual data 25% Formulating a Hypothesis 5% Investigate 20% Remediate 15% Update Workflow 5% Wrte Report 5% ANALYST WORKFLOW • Alerts Relevancy Engine • Smarter ML alerts • Centralized Alerts Console • Enriched with threat intel data • Fully enriched messages • Single pane of glass UI • Centralized real-time search • All logs in one place • Granular access to PCAP • Replay old PCAP against new signatures • Tag behavior for modelling by data scientists • Raw messages used as evidentiary store • Mine investigation history • Asset inventory as an enrichment • User identity as an enrichment • Workflow engine • Ticket clustering Everything you need to know in one place
  12. 12. Why Metron? Data Scientist Perspective Formulating a Hypothesis 5% Finding Data 20% Cleaning Data 20% Munging Data 20% Visualizing Data 20% Modelling Data 10% Validating Model 5% DATA SCIENCE WORKFLOW • All my data is in the same place • Data exposed through a variety of APIs • Standard Access Control Policies • Quickly see what I have • Metron normalizes objects • Partial schema validation on ingest • Tagging on ingest • Automatic data enrichment • Automatic application of class labels • Common Metron Objects • Massively parallel computation framework • Reusable Zeppelin Dashboards • Real-time search + UI • Integration with Python/R • Integration with analytics tools Reducing time from hypothesis to model
  13. 13. Part 1 – Overview of Apache Metron • Challenges with Today’s Security Tools to Combat Cyber Attacks • Introduction to Apache Metron • Personas and Core Themes • Why Apache Metron? Part 2 – Metron Architecture • Telemetry Parsing • Enrichment • Threat Intelligence • Alert Triage • Index and Write to Storage • Getting Started Agenda
  14. 14. Metron Architecture Telemetry Parsers TELEMETRYINGESTBUFFER Enrichment Indexers & Writers Telemetry Parsers Real-Time Processing Cyber Security Engine Threat Intel Alert Triage Cyber Security Stream Processing Pipeline DATASERVICES&INTEGRATIONLAYER Performant Network Ingest Probes Real-Time Enrich/ Threat Intel Streams Telemetry Data Collectors / Other..
  15. 15. Telemetry Parsing Accept logs Normalize log formats to common Metron event format Verifies incoming data Telemetry Parsing Enrichment Threat Intel Alert Triage Index & Write Metron Stream Processing Pipeline
  16. 16. Log format to Metron Message Conversion {"full_hostname":"","code":200,"method":"GET","url":" /af/shoes.html?","source.type":"squid","elapsed":832,"ip_dst_addr":"","original_strin g":"1475518070.281 832 TCP_MISS/200 448176 GET - DIRECT/ text/html","bytes":448176,"domain_without_subdomains":"","action":"TCP_MISS","ip_ src_addr":"","timestamp":1475518070281} 1475518070.281 832 TCP_MISS/200 448176 GET - DIRECT/ text/html ORIGINAL LOG LINE METRON JSON MESSAGE
  17. 17. Topic A Parser Topology ASensor A Native Format Apache Kafka Apache Storm Enriched Metron JSON Parsing and Normalizing Topology • Each Telemetry source has: • Kafka topic with original event content • Storm Topology to normalize into common Metron event format • All telemetry sources feed into single enrichment topic
  18. 18. Telemetry Parsing Storm Topology Parser Name enrichment Spout Bolt
  19. 19. Telemetry Parser Implementation Options • General Purpose Parsers • Easy to create – no programming • Grok • Regular expression based parser extracts Metron event values • CSV Parser • Maps CSV columns to Metron events • Java • High performance for high throughput sources • Complex formats not easily expressed as Regex • Java class implements MessageParser interface
  20. 20. Sensor A Sensor B Sensor N Topic A Topic B Topic (N) Apache Kafka PCAP PCAP Probe Physical Architecture Parse Topology A Parser Topology B Parser Topology N Apache Storm Native Format Native Format Native Format PCAP on HDFS Metron PCAP Service PCAP Topology Enrich Normalized Metron Format Enrichment/ Threat Intel Topology Out to Index + HDFS
  21. 21. Enrichment Add extra information to parsed event Add context to event to save Security Analyst time Score event for triage Telemetry Parsing Enrichment Threat Intel Alert Triage Index & Write Metron Stream Processing Pipeline
  22. 22. {"adapter.threatinteladapter.end.ts":"1475595978069","full_hostname":"","code": 200,"enrichmentsplitterbolt.splitter.end.ts":"1475595604032","":"Ca mbridge","enrichments.geo.ip_dst_addr.latitude":"42.3626","enrichmentsplitterbolt.splitter.begin.ts":"1 475595604032","":"US","enrichments.geo.ip_dst_addr.locID":"1 379","adapter.geoadapter.begin.ts":"1475595604033","enrichments.geo.ip_dst_addr.postalCode":"02 142","elapsed":832,"ip_dst_addr":"”….} {"full_hostname":"","code":200,"method":"GET","url":"http://www.aliexpress.c om/af/shoes.html?","source.type":"squid","elapsed":832,"ip_dst_addr":"","original _string":"1475518070.281 832 TCP_MISS/200 448176 GET - DIRECT/ text/html","bytes":448176,"domain_without_subdomains":"","action":"TCP_MISS","i p_src_addr":"","timestamp":1475518070281} SQUID PARSER MESSAGE ENRICHED SQUID MESSAGE
  23. 23. Enrichment Topology enrichments indexing
  24. 24. Enrichment Options • Geo • Add geo location information for ips (latitude, longitude, city, country, etc) • Host • Add information from known hosts configuration • Hbase • Threat intelligence information • Stellar • Apply Stellar Expressions to event • Flexibility and extensibility
  25. 25. Stellar Enrichments • DSL for simple computations and transformations on message variables • Capabilities • Reference event field • Boolean: and, or, not • Real/Integer Arithmetic: *, /, + , -, • Comparison: <, > ,<= ,>= • If else: if var1 < 10 then 'less than 10' else '10 or more’ • Check field exists: exists • Functions: MAP_GET, SPLIT, STARTS_WITH, etc • Documentation •
  26. 26. Enrichment Config File [vagrant@node1 ~]$ cat /usr/metron/0.2.0BETA/config/zookeeper/enrichments/squid.json "index": "squid", "batchSize": 5, "enrichment" : { "fieldMap": { "geo": ["ip_dst_addr", "ip_src_addr"], "stellar" : { "config" : { "host_info" : { "top_level_domain" : "DOMAIN_TO_TLD(full_hostname)" } } } } },
  27. 27. Event with top_level_domain Stellar Enrichment and geo enrichment {"adapter.threatinteladapter.end.ts":"1475617327962","full_hostname":"www.aliexpr","code":200,"enrichmentsplitterbolt.splitter.end.ts":"1475617327621", "top_level_domain":"com", "":"Cambridge” …..}
  28. 28. Threat Intelligence • Threat Indicators • Malicious domain watchlist • Malicious ip watchlist • MD5 signatures • Triaging • Structured Threat Information eXpression (STIX) • Threat Intelligence in machine format • May be exchanged by TAXII • Trusted Automated eXchange of Indicator Information (TAXII) • Describes how TI is exchanged • Automated standard exchange interface of threat intelligence
  29. 29. Enrichment - Threat Intelligence enrichments indexing
  30. 30. Ingesting Threat Intelligence Threat Intel Feed Feed Replicator Taxii Loader Taxxi Records Taxxi Records Metron Threat Intelligence
  31. 31. Accessing Threat Intelligence "enrichment" : { "fieldMap" : { "stellar" : { "config" : { "whois_info" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')" } } }, ENRICHMENT_GET(enrichment_type, key, hbase_table, column_family)
  32. 32. Scoring Event If alert = true, then event is a threat Calculate one or more risk scores Aggregate all scores to get event score SUM, MEAN, MAX, etc
  33. 33. Scoring Configuration "threatIntel" : { "fieldMap" : { "stellar" : { "config" : { "is_alert" : "whois_info.home_country != 'US'" } } }, "fieldToTypeMap" : { }, "config" : { }, "triageConfig" : { "riskLevelRules" : { "whois_info.home_country != 'US' && IN_SUBNET( if IS_IP(ip_src_addr) then ip_src_addr else NULL, '')" : 50.0, "IN_SUBNET( if IS_IP(ip_src_addr) then ip_src_addr else NULL, '')" : 20.0, "whois_info.home_country != 'US'" : 10.0 }, "aggregator" : "MAX", "aggregationConfig" : { } }
  34. 34. Model as a Service • Security Analysis Models applied during enrichment and threat intelligence • REST microservices implementing a specified interface • Machine learning or other model • Train model with event history stored in Hadoop • Register with discovery service • Referenced in Stellar enrichments • MAAS_GET_ENDPOINT • MAAS_MODEL_APPLY • System load balances across instances • More Information
  35. 35. Model as a Service : Architecture
  36. 36. Indexing and Writing • Store events for future reference • Forensics • Training machine learning models • Reprocess with new threat indicators Telemetry Parsing Enrichment Threat Intel Alert Triage Index & Write Metron Stream Processing Pipeline
  37. 37. Indexing Architecture indexing
  38. 38. Indexing • Elastic Search or Solr • Store in HDFS and/or Hive
  39. 39. Event Analysis and Machine Learning with Spark and Zeppelin 39
  40. 40. Getting Started • Apache Metron Site – • Ask Questions on Hortonworks Community Connection – • Source Code – • Deploy a quick start cluster – deployment/vagrant/quick-dev-platform
  41. 41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You