Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Future Of Threat Intelligence Platforms

294 views

Published on

Talk given at Area41 in June 2018. Video talk here: https://www.youtube.com/watch?v=FajClTgjcf0

  • Be the first to comment

The Future Of Threat Intelligence Platforms

  1. 1. © Copyright Fortinet Inc. All rights reserved. The future of threat intelligence platforms sharing data with privacy in mind Presenting: Dr. Paolo Di Prodi 15/06/2018
  2. 2. 2  Definitions: Threat Intelligence Platform and Services  The Cyber Threat Alliance  Data sharing in the alliance  Privacy and anonymity in the alliance  Traditional attacks to anonymity  Differential Privacy Example  Federated ML Example  Homomorphic ML Example  Companies working on differential privacy  Threat Intelligence Platform Roadmap  Q&A Summary of talk
  3. 3. 3  Research background in A.I. and robotics  New to cyber security about 7 years experience  I am not a differential privacy expert or cryptographer  Excessive hand waving is forbidden  No mathematicians were harmed during this presentation  No animated slides! Before we start….
  4. 4. 4 Threat Intelligence Platform Image Copyright at anomali.com Functions: • Aggregate intelligence from multiple sources • Curate, normalize, enrich and rank score data • Integrate with existing security systems • Analysis and sharing of threat intelligence Example of vendors: • Fortinet TIS • IBM X Force • Anomali ThreatStream • Palo Alto Networks AutoFocus • RSA NetWitness Suite • LookingGlass Cyber Solutions • AlienVault Security Management • LogRhythm • FireEye Deployment: premise or cloud
  5. 5. 5 Threat Intelligence Services • A TIP is pretty useless without any data! • You can deploy it horizontally within your organization. • You can then consume feeds from other providers. • Typically you will pay a subscription fee • How do you trust your external partners? • What is the quality of their indicators? Threat Intelligence Platform SIEM Hunting APAC VirusTotal Fireye AutoFoc us External Feeds Internal Feeds SOC
  6. 6. 6 Cyber Threat Alliance • It is a non for profit org where every member needs to share a minimum of intelligence with the alliance • It implies a trustworthy relation between all members • It provides policies to avoid collusion or competition…. Of course!
  7. 7. 7 What and how members share data in the alliance? • Example from VpnFilter incident 23 May 2018 • IOCs shared: • C2 Domains and IP of 1th stage • IP associated with 2nd stage • First stage, Second stage and Third stage hashes • The actual samples • Device models affected • Sharing format: • A document but typically STIX1 or STIX2 files • In future maybe MAEC
  8. 8. 8 Privacy sensitivity & Mitigations LOW Privacy Sensitivity HIGH PCAP WEF External IP Products and Software Web & Domains PE/ELF files Emails Sandbox Detonation IP addresses, Workstation names, Usernames, Passwords, Filenames, URL, IP, MAC, URL, HTTP Auth, FTP, IRC, POP, IMAP, Telnet, NTLM, Kerberos Recent APT includes target’s credentials… Audit Logs Linux Logs
  9. 9. 9 Where and when to anonymize? PCAP WEF Anonymize via SafePCAP Anonymize? IPS sigs PE/ELF Binaries Threat Intelligence Platform AV sigs QA Monitoring FP/FN Web Sigs Anonymize? Customers Endpoints VendorsDiscover more? False Positives?
  10. 10. 10 Local Privacy vs Global Privacy Randomized Response LaPlace Noise
  11. 11. 11  Introduced in 1998 by Latanya Sweeney and Pierangela Samarati  Implemented in a public API: Have I been pwned? Covered by media in 2018  It uses suppression or generalization  It does not include randomization  It is not a good method in high-dimensional data  A weak privacy mechanism  Stronger guarantees with l-diversity and t-closeness  Good to evaluate privacy vulnerability Why not just k-anonymity?
  12. 12. 12 K-Anonymity for Fortinet AVLOG Devices Serial Number Company Name Country Industry Company Size Detections FG****** McDonald US Food 100000+ 12345 FG****** Disney US Entertainment 100000+ 123 …. ….. ….. …. …. ….. Attribute K-Anon SerialNumber 1 CountryCode 1 Company Size 77 Customer Name 1 Industry 45 Detections 1 Not surprising! WOW! Not surprising! Not surprising! A low dimensional table: 6 is not that bad in this case
  13. 13. 13 K-Anonymity for Fortinet AVLOG Devices Serial Number Company Name Country Industry Company Size Detections *********** *********** ** Food 100000+ [1000,10000] *********** *********** ** Entertainment 100000+ [100,1000] …. ….. ….. …. …. ….. Suppression Generalization We lost some information and the detection counts are approximated
  14. 14. 14  Governor William Weld re-identification in 1997 from anonymized medical data + voter registration in MA, Cambridge » 87% of US population have unique date of birth, gender and postal code  Netflix challenge released in 2 October 2006 » About 100 million movie ratings by 500k Netflix subscribers, a subset 10% of all Netflix users » User Id and Movie Titles (in clear) Ratings perturbed » With 8 movie ratings (2 maybe wrong) and dates (14 days error): 99% of records were identified!  New York Taxi rides public dataset in 2013: » Linking to gossip sites identified 13 celebrities » How much they were tipping etc  Mobile phone data: » A trace » 3% of total users can be identified via day long traces » 10% are identified through 1 week long trace » 12% are identified through 2 weeks long trace Known linkage attacks…
  15. 15. 15  Hacks/Leaks etc: » Ashley Madison hack in 2015 » Location Smart May 2018, leaked location via demo web service for any AT&T, Sprint, T- Mobile, Verizon carrier device  Redacted documents failures: » HSBC on 3th December 2009, bankruptcy forms » TSA in December 2009, operating manual » New York Times May 2010, Snowden Documents » Various FOIA redactions » Enron data sets, credit cards, ssn etc And of course hacks, leaks and human errors
  16. 16. 16 We would like to ask the following questions… How many members of the alliance have seen a threat? How many customers were infected? How good are the indicators provided? Could we combine individual detections to build better ones? Differential Privacy Federated & Privacy Preserving Machine Learning Homomorphic Machine Learning
  17. 17. 17 Definition of differential privacy Cynthia Dwork in 2006  The chance that the noisy released result will be R is nearly the same, whether or not you submit your information Observation I did submit a survey R =1 I did NOT submit a survey R = 0 Prob(O)= B Prob(O)= A 𝐴 ≅ 𝐵 𝑃𝑟𝑜𝑏(𝑂|𝑅 = 1) 𝑃𝑟𝑜𝑏(𝑂|𝑅 = ! 1) ≤ 𝑒 𝜀 For all O and small 𝜀 ≥ 0 ln 𝑃𝑟𝑜𝑏(𝑀(𝑥)∈𝐶) 𝑃𝑟𝑜𝑏(𝑀(𝑥′)∈𝐶) ≤ 𝜀 Privacy Loss Privacy Accuracy
  18. 18. 18 Differential privacy with Randomized Response  Question: » Have you observed APT37 this month?  Context: » CTA partners response individually » A central database is prohibited  Protocol » Flip a coin if head tell the truth » If tail:  Say Yes if is head  Say No if is tail » t = 0.5  Problem: » How do we calculate p = % of partners infected?  Solution: » Y = 0.5 * p + 0.25 » p = 2 * Y – 0.5 p Y P(O=1|R=1) P(O=1) P(O=1|R=0) PLOSS 0 0.25 0.00 0 0.00 -1.58 0.1 0.3 0.25 0.1 0.04 -1.32 0.2 0.35 0.43 0.2 0.08 -1.10 0.3 0.4 0.56 0.3 0.13 -0.91 0.366 0.433 0.63 0.366 0.16 -0.79 0.4 0.45 0.67 0.4 0.18 -0.74 0.5 0.5 0.75 0.5 0.25 -0.58 0.6 0.55 0.82 0.6 0.33 -0.45 1 0.75 1.00 1 1.00 0.00 Privacy Loss (Bits)
  19. 19. 19 Federated Learning Example with DNS logs Threat Intelligence Platform Cisco Palo Alto Networks Fortinet Alexa + Crypto Locker Alexa + Goz Alexa + NewGoz Paillier Crypto (since 1999) Sum(Gradients)
  20. 20. 20  Experiment with: » 5147 DNS records » Alexa: 4948 » OpenDNS: 52 » Cryptolocker: 1667 » Goz: 1667 » NewGoz: 1666  Train-test split: » 80%-20%  Params: » 50 iterations, learning rate = 0.01  Paillier key size: 1024 Federated Learning Example with DNS logs  Execution with Paillier: » 9.23 s  Execution without Paillier: » 0.047 s = 196 x fast Before After Member Precision Recall Precision Recall Cisco 0.84 0.76 0.86 0.86 Palo Alto 0.86 0.82 0.85 0.85 Fortinet 0.85 0.8 0.86 0.83
  21. 21. 21 Homomorphic Machine Learning for email filtering Threat Intelligence Platform Cisco Fortinet Paillier Crypto (since 1999) Encrypted (Score) Cisco • Cisco trains a logistic regression model • Cisco wants to test it on Fortinet’s private emails • Fortinet computes the encrypted scores on its private emails • Cisco decrypts the scores • Cisco measures accuracy
  22. 22. 22  Experiment with: » 11029 emails » 27% are spam » 73% are ham  Train-test split: » 60%-40%  Params: » 7997 words  Paillier key size: 1024 Homomorphic Machine Learning for emails Member Classify Predict Decrypt Error Encrypt (W) Cisco 0.30s 0.0s 0.017 115.32s Fortinet (encrypted) 44.63s Cisco 19.25s 0.017
  23. 23. Who is doing it (right)? Companies that are using differential privacy, federated learning or a mix of those
  24. 24. Google RAPPOR (Randomized Aggregatable Privacy- Preserving Ordinal Reponse) published 25 July 2014 Federated Learning on GBOARD published 6 April 2017
  25. 25. 25  Rappor: » Used to detect unwanted software in google chrome » Good for histograms and counting Google  Federated Learning: » Used to train a global model from local models » Implemented on the GBOARD keyboard suggestions
  26. 26. Apple Local differential privacy (December 2017): • Private Count Mean Sketch • Private Hadamard Count Mean Sketch • Private Sequence Fragment Puzzle
  27. 27. 27  Discovering popular Emojis  Identify High Energy and Memory Usage in Safari  Discovering New Words Apple
  28. 28. UBER Differential Private SQL project called Flex or Chorus Released: 16 January 2018
  29. 29. 29  A query analysis and rewriting framework for general purpose SQL  It implements: Elastic Sensitivity, Sample + Aggregate  It is written in Scala  It supports equijoins and counts for now  You cannot do : sum, average, min & max, etc.  It takes 4 minutes to build via Maven UBER Chorus
  30. 30. Microsoft PINQ: 22 June 2009 wPINQ: April 2014
  31. 31. 31  PINQ: based on C# LINQ .NET  Aggregation supported: sum, count, average, median  Machine Learning: » K-means clustering » Perceptron neural network  Statistic: » Contingency TableCross Tab  Example: » var agent = new PINQAgentBudget(eps=1.0); » var data = new PINQueryable<SearchRecord>(data,agent); » var hosts = from record in data where record.Query == “hxxp://gogle.com/owned.php” group by record.IPAddress; » Console.WriteLine(“hxxp://gogle.com/owned.php:” + hosts.Count(1.0)); Microsoft
  32. 32. OpenMined Open Source Community https://www.openmined.org/ Released around September 2017 The brain child of Andrew Trusk ex Google
  33. 33. 33 Federated Learning + Homomorphic Encryption + Blockchain + Smart contract
  34. 34. Conclusion Roadmap Threat Intelligence Platform Differential Privacy Federated ML Homomorphic ML API
  35. 35. © Copyright Fortinet Inc. All rights reserved. Questions!!!! Cyber threat Alliance: https://www.cyberthreatalliance.org/ Threat Intelligence Service: https://tis.fortiguard.com/ Central Threat System (Beta): https://cts.fortiguard.com My email: paolo.research@fortinet.com

×