Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Datasets

1,204 views

Published on

MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Datasets, by Yuji Sekiya.

Presented at the APNIC 40 APOPS 1 session, Tue 8 Sep 2015.

Published in: Internet
  • Be the first to comment

MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Datasets

  1. 1. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu MATATABI : Cyber Threat Analysis and Defense Platform using Huge Amount of Datasets Yuji Sekiya* *The University of Tokyo, Japan
  2. 2. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu Multi-layer Threat Analysis Victim side action Filtering Load balancing Isolation Countermeasure for Attackers Report to ISP Announce to users Filtering at ISP level Configuration to servers Data collection at Multiple layers/locations Network device Servers Users Device Analysis Platform Analysis 1 Analysis 2 Analysis 3 Threat analysis (detection) across multiple datasources Threat Information Share Among organizations Announce to public 2
  3. 3. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu Security Information Pipeline  Making pipeline through divert activities  Data collection (Traffic, User behavior, etc)  Threat Analysis  Human decision  Protection (Enforcement) ProtectionData Analysis Human Inputs 3
  4. 4. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu Datasets 4 MATATABI Switch Router DNS Firewall SPAM Phishing Site External Information sFlow NetFlow URL SPAM Sender URL syslog querylog pcap text URL
  5. 5. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu Data Volume N*10GByte/day 20TB/10months Traffic sampling Packet dump E-mail DNS Web traffic 5
  6. 6. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu 1. Forensics : preserving log data  To keep evidences as traceable.  To analyze multi-source data exhaustively 2. Scalability : should be tolerable to huge data  To store a huge amount of datasets  To process datasets in a reasonable time 3. Real-time analysis : processing performance  Possibly real-time analysis of any datasets 4. Uniform programmability :  Various data format should be easily accessible  Various analysis program can be used Goals of MATATABI 6
  7. 7. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu NECOMA ECO System Infrastructure Data End Point Data API API Analysis Module / Early Warning System API Threat Information Sharing External Knowledge DB API Crawler API External Resource (web) Infrastructure Devices End Point Devices API API Resilience Mechanism API Get external threat information Get data Put analysis results Get threat information and other results Get threat information Control infrastructure and end point devices Crawling external resource and extracting knowledge Collection Probe Collection Probe Get data Petsas et al., A Trusted Knowledge Management System for Multi-layer Threat Analysis. TRUST 14’ (poster session), June 2014 7
  8. 8. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu HDFS DGA Analyzer DDoS detection Hive/ Presto Thrift Mahout Rhadoop DNS querylog dns-pcap sflow netflow spam open resolver phishing darknet topology endpoint user behavior client honeypot Hadoop Cluster API (JSON) hadoop- pcap anomaly detection (2) Data import Measurement Data (3) Analysis Module (1) Data Storage (4) MATATAPI  4 components 1) Storage 2) Data import/process module 3) Analysis module 4) Application Programming Interface (API) MATATABI Overview 8
  9. 9. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu Built by Open-Source Software  Actively using open-sourced software  Apace Hadoop (HDFS, MapReduce, etc)  Apache Hive (SQL-like language => distributed jobs)  Facebook Presto (Distributed SQL engine)  Apache Mahout (Machine learning library)  Apache Thrift (Language bindings)  Hadoop-pcap (pcap file parser)  Fixed issues and packaged by NECOMA  https://github.com/necoma 9
  10. 10. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu 1) Storage  Storing measured data to Hadoop Distributed FileSystem (HDFS)  Easily scaled-out • Data access by tools – Hive/Presto-db – Hadoop-pcap HDFS DGA Analyzer DDoS detection Hive/ Presto Thrift Mahout Rhadoop DNS querylog dns-pcap sflow netflow spam open resolver phishing darknet topology endpoint user behavior client honeypot Hadoop Cluster API (JSON) hadoop- pcap anomaly detection (2) Data import Measurement Data (3) Analysis Module (1) Data Storage (4) MATATAPI 10
  11. 11. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu 2) Data import module  Pre-processing measurement data • By each dataset – Raw data (e.g., pcap) – Converting to Hive tables HDFS DGA Analyzer DDoS detection Hive/ Presto Thrift Mahout Rhadoop DNS querylog dns-pcap sflow netflow spam open resolver phishing darknet topology endpoint user behavior client honeypot Hadoop Cluster API (JSON) hadoop- pcap anomaly detection (2) Data import Measurement Data (3) Analysis Module (1) Data Storage (4) MATATAPI 11
  12. 12. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu 3) (Threat) Analysis module  Easily implement-able  Bunch of analysis  Distributed computations (MapReduce) HDFS DGA Analyzer DDoS detection Hive/ Presto Thrift Mahout Rhadoop DNS querylog dns-pcap sflow netflow spam open resolver phishing darknet topology endpoint user behavior client honeypot Hadoop Cluster API (JSON) hadoop- pcap anomaly detection (2) Data import Measurement Data (3) Analysis Module (1) Data Storage (4) MATATAPI 12
  13. 13. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu 4) Application Programming Interface (API)  Export analysis results  Export dataset itself (if needed)  Implemented with n6 REST API  JSON/CSV/IODEF format HDFS DGA Analyzer DDoS detection Hive/ Presto Thrift Mahout Rhadoop DNS querylog dns-pcap sflow netflow spam open resolver phishing darknet topology endpoint user behavior client honeypot Hadoop Cluster API (JSON) hadoop- pcap anomaly detection (2) Data import Measurement Data (3) Analysis Module (1) Data Storage (4) MATATAPI 13
  14. 14. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu Analysis Modules (Use cases) 14 Name Datasets Frequency LoC (#lines) Remark ZeuS DGA detector DNS pcap, netflow daily 25 hadoop-pcap UDP fragmentation detector sflow daily 48 Phishing likelihood calculator Phishing URLs, Phishing content 1-shot – Mahout (RandomForest) NTP amplifier detector netflow, sflow daily 143 pyhive, Maxmind GeoIP sflow daily 24 DNS amplifier detector sflow, open resolver [19] daily 37 Anomalous heavy-hitter detector netflow, sflow daily 106 pyhive DNS anomaly detection DNS pcap, whois, malicious/legitimate domain list daily 57 hadoop-pcap, Mahout (RandomForest) SSL scan detector sflow 1-shot 36 DNS failure graph analysis DNS pcap daily 159 pyhive
  15. 15. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu • Make a SQL request by Presto • Get IP addresses that sends UDP traffic on port 123 with a packet size = 468 • Packet size of Monlist reply = 468 bytes 15 Analysis Example (1) Finding NTP Amplifiers SELECT sa FROM netflow WHERE sp=123 AND pr='UDP' AND ibyt/ipkt=468 GROUP BY sa
  16. 16. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu presto:default> SELECT sa FROM netflow_wide_rcfile WHERE sp=123 AND pr='UDP' AND ibyt/ipkt=468 AND dt>'20150401' GROUP BY sa; Query 20150810_090728_00174_u378i, RUNNING, 10 nodes, 845 splits 0:11 [ 457M rows, 9.8GB] [41.3M rows/s, 908MB/s] [======>>>>>> ] 14% STAGES ROWS ROWS/s BYTES BYTES/s QUEUED RUN DONE 0.........R 0 0 0B 0B 0 1 0 1.......R 1.88K 135 33.2K 2.39K 0 8 0 2.....R 457M 32.9M 9.8G 723M 622 94 120 Query 20150810_090728_00174_u378i, RUNNING, 10 nodes, 845 splits 1:05 [1.63B rows, 37.7GB] [25.2M rows/s, 596MB/s] [===========================>>>>>>>> ] 64% STAGES ROWS ROWS/s BYTES BYTES/s QUEUED RUN DONE 0.........R 0 0 0B 0B 0 1 0 1.......R 16.9K 260 299K 4.61K 0 8 0 2.....R 1.63B 25.1M 37.7G 595M 147 147 542 16 Analysis Example (1) Finding NTP Amplifiers
  17. 17. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu sa ----------------- 17 Analysis Example (1) Finding NTP Amplifiers
  18. 18. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu 18 Analysis Example (2) Detecting DNS Amplifier Attacks Open Resolver DNS Server Attackers Spoofed Packets
  19. 19. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu  Found Response with RD(Recursive Desired) flag.  Queries from Open Resolver Servers  Attempts of the Water Torture Attack select src,count(*) from dns_pcaps where dt='20150401' and dns_qr=true and dns_flags like '%rd%' and server=‘dns1-pcap’ group by src; Analysis Example (2) Detecting DNS Amplifier Attacks
  20. 20. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu 20 Authoritative DNS Servers Resolver DNS Server Attackers Spoofed Answers Analysis Example (3) Detecting DNS Cache Poisoning Attacks Query
  21. 21. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu Analysis Example (3) Detecting DNS Cache Poisoning Attacks  Normally # of query from resolver server > # of query to resolver server  Counting number of queries from resolver server  Counting number of answers to resolver server  If not, it is possibly ddos or cache poisoning attack against our DNS resolver server select floor(ts/60),count(*) from dns_pcaps where dt = '20150401’ and dns_qr=false and dns_flags not like ‘%rd%’ and server=’ns1-pcap‘ group by floor(ts/60); select floor(ts/60),count(*) from dns_pcaps where dt = '20150401’ and dns_qr=true and dns_flags like ‘%aa%’ and server=‘ns1-pcap’ group by floor(ts/60);
  22. 22. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu Detecting Botnet infected hosts by DGA Queries 22 • Domain Generation Algorithm (DGA) – Auto generated domain names used by botnets – Usually the names are changed in a short span – Difficult to detect botnets hosts by domain name. • ZeuS-DGA – [a-z0- 9]{32,48}.(ru|com|biz|info|o rg|net) – Example: f528764d624db129b32c21fbc a0cb8d6.com 001: gh3t852dwps7v47v4139eid62g190bjrs 002: g22tdk3q8097o97fcs0j46fe0l7wc56us 003: gj9d611364m0ysceiq0x250fm5u69zq5s : botmaster bot domain list: periodically generate 001: gh3t852dwps7v47v4139eid62g190bjrs 002: g22tdk3q8097o97fcs0j46fe0l7wc56us 003: gj9d611364m0ysceiq0x250fm5u69zq5s : domain list: periodically generate g22tdk3q8097o97fcs0j46fe0l7wc56us.ru 001.ru 001.com 002.ru
  23. 23. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu  Found specific regular expression type in queries  Some botnet clients generate dynamic, randomized DNS name to contact botnet C&C servers (so called DGA) select src,dns_question from dns_pcaps where regexp_like (dns_question, '[a-z0-9]{32,48}.(ru|com|biz|info|org|net)') AND NOT regexp_like(dns_question, 'xn--') AND dt='20150401'; Analysis Example (4) Detecting DGA Queries
  24. 24. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu presto:default> select src,dns_question from dns_pcaps where regexp_like (dns_question, '[a-z0- 9]{32,48}.(ru|com|biz|info|org|net)') AND NOT regexp_like(dns_question, 'xn--') AND dt>'20150401'; Query 20150810_114848_00226_u378i, RUNNING, 11 nodes, 1,435 splits 1:17 [ 123M rows, 4.15GB] [1.61M rows/s, 55.5MB/s] [ <=> ] STAGES ROWS ROWS/s BYTES BYTES/s QUEUED RUN DONE 0.........R 0 0 0B 0B 0 1 0 1.......S 123M 1.61M 4.15G 55.5M 1100 217 117 Query 20150810_115500_00228_u378i, RUNNING, 11 nodes, 143 splits 2:22 [87.4M rows, 4.73GB] [ 615K rows/s, 34.1MB/s] [========================================>>] 93% STAGES ROWS ROWS/s BYTES BYTES/s QUEUED RUN DONE 0.........R 0 0 0B 0B 0 1 0 1.......R 87.4M 615K 4.73G 34.1M 0 9 133 24 Analysis Example (4) Detecting DGA Queries
  25. 25. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu 2001:XXXX:1d8:0:0:0:0:106 | cg79wo20kl92doowfn01oqpo9mdieowv5tyj. 0 IN A 2001:XXXX:0:1:0:0:0:f | cg79wo20kl92doowfn01oqpo9mdieowv5tyj.com. 0 IN A 157.XXX.234.35 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN A 133.XXX.127.131 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN A 23.XXX.104.44 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN A 133.XXX.124.164 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN A 157.XXX.234.35 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN AAAA 133.XXX.127.131 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN AAAA 23.XXX.111.231 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN AAAA 133.XXX.124.164 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN AAAA 157.XXX.193.67 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 133.XXX.127.131 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 173.XXX.59.40 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 133.XXX.124.164 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 157.XXX.193.67 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 133.XXX.127.131 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 192.XXX.79.30 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 133.XXX.127.131 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 185.XXX.155.12 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 133.XXX.124.164 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 157.XXX.193.67 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 133.XXX.127.131 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 173.XXX.58.45 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 133.XXX.124.164 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A 25 Analysis Example (4) Detecting DGA Queries
  26. 26. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu Movie : Zeus-DGA Analysis 26
  27. 27. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu Visualization of Zeus DGA and Botnet  2015/07/01 – 2015/07/05  The number of the most active DGA query is 23  Related traffic flows from netflow datasets. 27
  28. 28. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu Visualization : Zeus-DGA Distribution 28
  29. 29. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu One of Protection Methods  SDN IX (PIX-IE)  Programmable IX in Edo : PIX-IE  Mitigating and filtering suspicious flows at IX  IX is a public space in the Internet  Before link saturation, an ISP operator can stop DDoS flows 29 Programmable IX (PIX-IE) ISP ISP ISP ISP ISP ISP Vic m ISP Vic m Service Spoofed SRC UDP Link Satura on The operator has to contact to each ISP, and ask to filter the DDoS packets … Human Interac on Programmable IX (PIX-IE) ISP ISP ISP ISP ISP ISP Vic m ISP Vic m Service Mi ga on Mi ga on Mi ga on Mi ga on REST API
  30. 30. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu Summary and Ongoing Work  MATATABI: a platform for threat analysis  Exploiting (existing) big data software  Data collection to threat knowledge base  Toward security information pipeline  Enrichment of analytical results  To policy enforcement  Real-time analysis 30 ProtectionData Analysis Human Inputs

×