Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Needles &
Haystacks:
Machine Learning powered
Threat Detection
Ioan Constantin,
Orange Romania
Text with illustrationHow do you go about finding a
needle in a haystack?
It’s simple, actually.
You bring a magnet.
Agenda
Complex Threats
A brief walk trough
some of the new
challenges most
companies face now
with the advent of
APTs.
Mac...
Advan
ceduses sophisticated
exploits, 0-Days, Social
Engineering. Stealthy.
Compromised end-points
Can infect multiple end...
Keep a
door
open
Exfiltrate data
Delete all track
Break stuff
06
Complete
mission
Maintain
presence, use
stealthy
(encrypt...
77%Network Technologies
(FWs, Routers,
Switches)
64 %
Log Monitoring / Event
Correlation / SIEMs / A.I.
/ M.L.
Technical C...
Malware Variety
There are hundreds of
million of malware variations
that can be used in a APT.
This makes it challenging t...
The average
company takes 170
days to detect an
advanced threat, 39
days to mitigate and
43 days to recover*
*According to...
Obligatory timeline slide
1998 2003 2009 2010 2011 2018
Moonlight Maze
Some guys in Russia breaks
into systems at the Pent...
Proactive
Security
Bug Bounty
Programs
Machine Learning,
maybe?
Threat
Intelligence
OSINT
SOCMINT
HUMINT
Counter
surveilla...
Machine Learning is the science of getting
computers to act without being explicitly
programmed (Stanford University, Arti...
Machine Learning
Learn from available data
Using representation, evaluation
and optimisation. Use training
data
Establish ...
Available data
Network
Devices
Endpoints Applications
Security
Products
Cool stuff
Active network
equipment
Routers, switc...
Sneak &
Peek at our
Analytics & Threat
Detection Platform
Pentest
Results
Infobyte’s Faraday
Vulnerability Scans
We’re keeping an eye on
vulnerable systems that
can be used as entr...
Architecture
KafkaLogstashElastic SearchKibanaWe
Architecture - II
Logstash Servers
Event Parsing
Event Enrichment
Data Ingestion
Kafka Servers
Load Balancing
Data Collect...
Machine Learning Algorithms
Sweet spot UnsupervisedSupervised
Supervised Machine
Learning relies on feeding
the machines w...
Machine Learning Algorithms
Supervised
Unsupervised
Labeled Data
Unlabeled Data
Malware Identification
Spam Detection
Anom...
(More than) Text Mining
Twitter is a great data source for text mining. This has been done before and
we’re using what we’...
Circling back to APTs
04
Lateral
Movement
02
Establish
foothold
Noise removal:
Exclude known good
traffic patterns and
leg...
Expected Output
Dashboards
Configurable dashboard that can be customized to show both live data
and analytics for any numb...
Are we building yet another
Open-Source SIEM?
In short: NO. We’re not looking to
replace existing ones. We’re not
looking ...
In closing, some nice stats*:
*From a Ponemon survey conducted in H1 2018
Savings
Companies using M.L. to
detect threats s...
Thanks 
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Needles, Haystacks and Algorithms: Using Machine Learning to detect complex threats

Download to read offline

Ioan Constantin in Bucharest, Romania on November 8-9th 2018 at DefCamp #9.

The videos and other presentations can be found on https://def.camp/archive

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Needles, Haystacks and Algorithms: Using Machine Learning to detect complex threats

  1. 1. Needles & Haystacks: Machine Learning powered Threat Detection Ioan Constantin, Orange Romania
  2. 2. Text with illustrationHow do you go about finding a needle in a haystack? It’s simple, actually. You bring a magnet.
  3. 3. Agenda Complex Threats A brief walk trough some of the new challenges most companies face now with the advent of APTs. Machine Learning Beyond buzzwords: how do we improve our existing detection methods and technologies with the context offered by M.L.? ‘Post-hash’ Context Threat Intel, Proactive Security, Threat Hunting, SOCMINT, HUMINT, OSINT Threat Detection Are we trying to replace our existing SIEMs, FWs and IPSs? Spoiler: No. We’re building around it. 1 2 3 4 5SIEM much? Are we trying to build yet another Open Source SIEM?
  4. 4. Advan ceduses sophisticated exploits, 0-Days, Social Engineering. Stealthy. Compromised end-points Can infect multiple end- points in large networks, of various type such as smartphones, workstations, applications Highly Dependant on Social Engineering, surveillance, supply chain-compromise Persistent Low-and-slow approach to attacks Targeting is conducted trough continuous monitoring Task-specific rather than opportunistic The goal is to maintain long-term access Complex Threats
  5. 5. Keep a door open Exfiltrate data Delete all track Break stuff 06 Complete mission Maintain presence, use stealthy (encrypted) channels to communicate 05 Keep a low profile Expand control to other endpoints, servers, active equipment, harvest data and configs 04 Lateral Movement Look around, find other vulnerable victims, gather as much intel as you can 03 Reconnaiss ance. Create C2C controllable victim end- points in the network, escalate privileges 02 Establish foothold Using social engineering, spear phishing, media infestation with zero-day malware 01 Initial compromise Works in stages APTs are a complex species of threats And we have no definitive guide or ‘how-to’ detect it or mitigate it. We’re constantly learning from the experience of others and we’re relying on human intelligence to better understand these threats
  6. 6. 77%Network Technologies (FWs, Routers, Switches) 64 % Log Monitoring / Event Correlation / SIEMs / A.I. / M.L. Technical Controls used to Protect Against APTs* *according to a TrendMicro Survey, H1 2017 Mobile Security Gateways, Mobile Anti- Malware, Mobile Device Management 37% 83%Antivirus & Antimalware 66% Zoning Off (Network Segregation)
  7. 7. Malware Variety There are hundreds of million of malware variations that can be used in a APT. This makes it challenging to detect. 41 2 3 Traditional security is ineffective As it needs more than signature or rule-based detection. It needs context and some form of automated learning method to expand the context Log analysis and Log correlation has its limits Because, of course, it also needs context and ‘perspective’. There’s lots of noise It’s extremely hard to separate noise from legitimate traffic and there’s lots of noise being ‘ingested’ by security equipment Challenges in mitigation
  8. 8. The average company takes 170 days to detect an advanced threat, 39 days to mitigate and 43 days to recover* *According to the Ponemon Institute 170 days 39 days 43 days 0% To Mitigation For Detection To Recovery Challenges in mitigation
  9. 9. Obligatory timeline slide 1998 2003 2009 2010 2011 2018 Moonlight Maze Some guys in Russia breaks into systems at the Pentagon, NASA, U.S. Department of Energy. Exfiltrates thousands of docs. Attack lasted 2 years Ghost Net Large-scale attack, initiated from phishing e-mails, downloaded trojans RSA Attack Adobe Flash exploit + phishing e-mail = pwned SecurID Titan Rain Attacks on large defense contractors from various Chinese sources. Stuxnet Fun fact: there’s chatter out there about a not-so-long- awaited sequel  Stuxnet 2.0 VPNFilter / BlackEnergy And, of course the youngest of the bunch…
  10. 10. Proactive Security Bug Bounty Programs Machine Learning, maybe? Threat Intelligence OSINT SOCMINT HUMINT Counter surveillance Threat Hunting Solutions? Provide Context.
  11. 11. Machine Learning is the science of getting computers to act without being explicitly programmed (Stanford University, Artificial Intelligence Laboratory) Machine Learning (…) algorithms can figure out how to perform important tasks by generalizing from examples (University of Washington) Machine Learning
  12. 12. Machine Learning Learn from available data Using representation, evaluation and optimisation. Use training data Establish baseline Define normal behaviour, define normal data. Don’t over use the word ‘normal’  Find Anomalies Identify deviations from the baseline. Correlate deviations with other input data Act upon finding Classify deviations as anomalies. Notify / Take action
  13. 13. Available data Network Devices Endpoints Applications Security Products Cool stuff Active network equipment Routers, switches, printers, APs and the like. They all generate tons of data PCs, Laptops, Smartphones, Servers Metrics, Operating Systems Logs, Hardware logs, Security logs. Interesting to say the least Software clients Client-side and server- side. E-Mail clients, Web Browsers, VPN clients. In all shapes and sizes, from ubiquitous MS Windows to iOS. Networked, in the cloud or on endpoints Everything from FWs, IPSs, Sandboxes, AVs / Anti-Malware, Anti-Spam & E-Mail security Gateways, MdM etc. Our secret sauce We generate huge amounts of data from TELCO-specific technologies and equipment. We add it to the mix to better help with context.
  14. 14. Sneak & Peek at our Analytics & Threat Detection Platform
  15. 15. Pentest Results Infobyte’s Faraday Vulnerability Scans We’re keeping an eye on vulnerable systems that can be used as entry points in larger networks Rapid7 OpenData Secrepo / Censys / etc. Threatmap & IoCs We use data from our own Threatmap service to detect vulnerabilities and malware in compromised websites Kaggel / Virustotal / VXHeaven Malware analysis Security Data We gather anonymized statistical data about app usage, website browsing, detected attacks and malware activity from our Managed Security Services Threat intel feeds IoCs, hashes, tweets and posts. Criticalstack / MDL / Kiran Bandla’s APTNotes / ISC Suspicious Domains / ThreatMiner / Threat Crowd Cellular data Wi-FiWe operate large Wi-Fi networks both for consumer and business customers.
  16. 16. Architecture KafkaLogstashElastic SearchKibanaWe
  17. 17. Architecture - II Logstash Servers Event Parsing Event Enrichment Data Ingestion Kafka Servers Load Balancing Data Collection Full Packet Data Endpoints & Servers Cloud Services Machine & Equipment Logs Service Bus Everything Else Kafka Servers Caching Kibana Servers Event View Custom Dashboards Reporting Long Term Storage Hadoop Servers ElasticSearch Servers Machine Learning Analytics Caching & Short Term Storage Alerting High-level overview Of our proposed architecture for our Threat Detection & Analytics Platform Beats Syslog Cloud APIs OSINT HUMINT SOCMINT
  18. 18. Machine Learning Algorithms Sweet spot UnsupervisedSupervised Supervised Machine Learning relies on feeding the machines with labeled data, be it ‘good’ data or ‘bad’ data. This helps create a baseline of expected behaviour, in threat detection and, in turn, will yield results in detecting anomalies On the Unsupervised side you have to consider several approaches such as dimensionality reduction and association rule learning. These approaches are useful in making large data sets easier to analyse or understand. They can be used to reduce the complexity (dimensionality) of fields of data to look at or group things together (clustering)
  19. 19. Machine Learning Algorithms Supervised Unsupervised Labeled Data Unlabeled Data Malware Identification Spam Detection Anomaly Detection Risk (Reputation) Scoring Clustering Association – Rule Learning Dimensionality Reduction Entity Classification Anomaly Detection Data learning (exploration) Machine Learning
  20. 20. (More than) Text Mining Twitter is a great data source for text mining. This has been done before and we’re using what we’re learning from both commercial solutions like Sintelix and Bitext and open source projects. Twitter Scrap Engine / Text Mining Local Storage Noise Removal Feature Extraction Feature Selection ElasticSearch Machine Learning Anaylitics Classification
  21. 21. Circling back to APTs 04 Lateral Movement 02 Establish foothold Noise removal: Exclude known good traffic patterns and legitimate pre- labeled traffic Gather Logs: And traffic dumps Known domains: Ingest data from known malware domain lists, SOCMINT Look for anomalies: Large amount of request, single requests, large DNS queries etc. Malware can be noisy Visualize Alert Report Gather Logs: Windows Machines, mostly Psexec DCOM QuarkPwD RDP Logon Scripts WCE WMIC WinRM MS14-058 mimikatz PWDumpX timestomp Analyze logs For specific event IDs, attribute changes etc. Visualize Alert Report
  22. 22. Expected Output Dashboards Configurable dashboard that can be customized to show both live data and analytics for any number of categories such as Malware Detections, C2 Server activity, Phishing Attacks reported trough SOCMINT and our own sensors etc. Alerting Customizable alerts per events such as live attacks, threshold breaking, Social Sentiment Analysis and -possibly- any delta over a preset baseline Reporting Detailed reporting on all incidents, incident categories, sources of attacks, types of threats etc.
  23. 23. Are we building yet another Open-Source SIEM? In short: NO. We’re not looking to replace existing ones. We’re not looking into compliance and automated mitigation. We’re trying to teach our existing open-source log management systems some new tricks so they can help us in detecting complex threats.
  24. 24. In closing, some nice stats*: *From a Ponemon survey conducted in H1 2018 Savings Companies using M.L. to detect threats save an estimate of 2.5 million US$ in operating costs 60%$2.5m 60% 69% Improves Productivity Companies are positive that deploying M.L.-based security tech improves the productivity of their security personnel. More speed The most significant benefit of using M.L. for threat analysis is increased speed. 64% say that the most significant advantage is the acceleration in the containment of infected endpoints, devices, hosts. Identify Vulnerabilities Sixty percent of the respondents stated that M.L. identified their application security vulnerabilties
  25. 25. Thanks 

Ioan Constantin in Bucharest, Romania on November 8-9th 2018 at DefCamp #9. The videos and other presentations can be found on https://def.camp/archive

Views

Total views

239

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

5

Shares

0

Comments

0

Likes

0

×