Treat Detection using Hadoop

4,653 views

Published on

Published in: Technology, Education

Treat Detection using Hadoop

  1. 1. Caspida – Karthik KannanCaspida Inc. Threat Detection Using Hadoop KARTHIK KANNAN FOUNDER, CMO
  2. 2. Caspida – Karthik Kannan Title  Using Hadoop and Machine Learning to Detect Security Risks and Vulnerabilities, and Predict Breaches in your Enterprise Environment
  3. 3. Caspida – Karthik Kannan Topics  Security challenges  Today’s approaches – limitations  Why is security a Big Data problem?  Hadoop and ML in other industries  Security – with Hadoop and ML  Some examples  Where do you go from here?
  4. 4. Caspida – Karthik Kannan Security Unique use case that applies horizontally  Incident analysis  Anomaly detection  Queries at scale  Predetermined metrics Needs to be dynamically self-learning
  5. 5. Caspida – Karthik Kannan Today's Security Challenges  Target credit card breaches  Snowden insider attack  RSA security breach  Twitter hacking
  6. 6. Caspida – Karthik Kannan CIO Survey : Top Concerns
  7. 7. Caspida – Karthik Kannan Verizon Data Breach Report 2014 DBIR data shows attackers are getting better and faster at what they do, more quickly than organizations can address the threats. http://www.verizonenterprise.com/DBIR/2014/?utm_source=ContextualAds&utm_medium=ResultLinks&utm_c ampaign=DBIR2014
  8. 8. Caspida – Karthik Kannan Market Data Courtesy: Mary Meeker, KPCB, Internet Trends Report, 2014
  9. 9. Caspida – Karthik Kannan Current Methods Fail Limited scale, manual, no dynamicity Signatures Rules Malware-Detection
  10. 10. Caspida – Karthik Kannan Why is Security a Big Data Problem?  Variety of security events − New sources, new relationships, new entities  Analysis sophistication − Dynamic correlations, sequences, non- contiguous patterns  Context – time − Months & years, not days Good reference/reading:
  11. 11. Caspida – Karthik Kannan The Right Tools for the Right Purpose Protecting the perimeter and defending against known attacks (signatures) Discovery unauthorized use of SaaS/cloud apps and policy enablement for Shadow IT SIEMs collect data, use extensive human-generated rules, rely on manual analysis and provide static alerts Firewall, IPS, malware, AV Cloud security SIEM Lacking dynamic, self-learning methods that are needed to detect sophisticated attacks?
  12. 12. Caspida – Karthik Kannan Mobile There is an App for Everything! SMSPhone MMS IM Mobile App Stores Mobile Device Mgmt (MDM) Mobile App Mgmt (MAM) Cloud SaaS Monitoring SaaS Encryption Web Mail CRM/ERP SaaS Apps (Salesforce, …) Custom Apps/TestDev Clouds Desktop Password Hashing Antivirus Anti-Malware SW OS security layering OS-level Sandboxing Disk Encryption Productivity Apps/Development/Test Security in the Technology Evolution Application-specific Attacks (Facebook wall, Browser) Attackers AttackTypes DDoS (Zombies etc.) Password Guessing Filesystems / DBs Misconfigurations Viruses Malware/Spyware Keyloggers Sniffing Governments Special Interest Groups Polymorphic APT Botnets Web App Attacks (XSS, etc.) Phishing Enterprise Firewalls Multi-Factor Authentication IDSAntivirus Malware Sandboxing Threat Feeds SIEMVPN Corporate Email Finance Apps Corporate Storage/Filers Collaboration Tools/ECM Cloud Apps Time2000 20131990 2010 AttackSophistication
  13. 13. Caspida – Karthik Kannan Stages of an attack Research Infiltrate Capture Exfiltrate Market- place 86% of enterprises focus on step 2 only Studies show that companies save up to $4M/year when they have security intelligence systems that focus on all stages 1 2 3 4 5
  14. 14. Caspida – Karthik Kannan ML + Statistical Models Visualization Models Data Lake Standard models: K- means, Random Forest, Nearest-neighbor, Gaussian, Bayesian etc. Custom models: user patterns/behavior, time- oriented, data attributes- specific, SaaS, mobile
  15. 15. Caspida – Karthik Kannan Algorithms  Time Series Analysis − Good when dealing with time series − Examples:  Linear Regression  Parametric (ARIMA/FARIMA)  Forecasting: Holt-Winters  Classification Models − Good to find which categories things are falling under − Examples:  Logistic Regression  Decision Trees  Decision Tables  Neural Networks  K-Nearest Neighbors  Ensemble Models (Random Forests)  Grouping Models − Used for finding global patterns at scale − Examples:  K-Means Clustering  Random graph walks  Inference Models − Important when trying to infer value of a feature from a context − Example  Association Rules  Bayesian Networks  Simplification Models − Important when we need to decrease number of features analyzed − Examples  Principal Component Analysis (PCA)  Low-Rank Approximation  Single-Value Decomposition (SVD)
  16. 16. Caspida – Karthik Kannan Data Sources: Information Value Pyramid Network Packets: L2-L4 Network Packets: L7 Generic System Logs Application Logs Lower Volume; Concentration of Information No need to decipher semantics of information Top-Down view with Correlation on important signals OS logs on system events, processes’ health Need additional deciphering of information High-Volume of Source Data Can capture malware code for analysis Problems with encrypted traffic High-Volume of Source Data Analysis only based on signatures and packet statistics
  17. 17. Caspida – Karthik Kannan Advanced Persistent Threat (APT) Kill Chain A handful set of users targeted by phishing attacks The user downloads the malware which finds a back door to access the system Attacker attempts to move other systems and accounts by elevating privileges accordingly Data is gathered from different systems and staged for exfiltration Data is sent out via multiple channels (encrypted over FTP, DNS back channels etc.) Lateral Movement Phishing and Zero Day Attack Back Door Data Gathering Exfiltrate
  18. 18. Caspida – Karthik Kannan Ideal Hadoop-based solution Data Sources Data Lake Data Science
  19. 19. Caspida – Karthik Kannan Machine Learning in Industries  eCommerce: identify shopper behavior and predict buying patterns, inventory planning, recommendations − AggregateKnowledge − RichRelevance − Amazon  AdTech: identify mobile/online users, model their preferences, and render appropriate advertisements to the right audience − AdMob (Google) − MoPub (Twitter) − Efficient Frontier (Adobe)
  20. 20. Caspida – Karthik Kannan Types of Security Analytics  Breach − Phishing attack − DDoS attack − Watering hole attack  Exploitation − Lateral movement − Domain account misuse  Exfiltration − Privileged data leakage − Anomalous login activity  Debilitation − App or DB server load/activity patterns − Web server patterns  Monitoring − Metrics management
  21. 21. Caspida – Karthik Kannan Data Sources & Analysis Source Information obtained 1 Web server Incoming, outgoing traffic, IP addresses, times, session durations 2 Domain controllers User IDs accessing specific IP addresses, times, durations 3 IAM servers Apps, servers, other protected services users are accessing, times, durations 4 Content servers Detailed transactional histories, customer account data, ACLs 5 Messaging server events Email stats, attachment info, external communications (IPs, frequencies) + correlations – across time and events to produce network of related users, apps, servers and other critical services that may be affected by threats + machine learning algorithms – dynamic models driving automatic insights into malicious, external, APT, SaaS, mobile or network threats in repeatable fashion + search/queries – to sharpen insights and threat intelligence by drilling down into desired dimension such as time window, geography, criticality etc.
  22. 22. Caspida – Karthik Kannan Anatomy of an attack IP Location 200.55.12.68 Brazil 58.202.85.1 China 220.12.98.41 US/SC 119.56.128.25 China … … IP Location 200.55.12.68 Brazil 58.202.85.1 China 220.12.98.41 US/SC 119.56.128.25 China … … UID2 UID1 UID3 UID5 UID4 UID6 UID7 UID9 UID1 UID8 Svr1 Svr2 App1 App2 DB1 FTP1 Identification of suspicious IP originations, destinations IP addresses, geo-spatial information collection Network of correlations for suspected IPs; which users are accessing them the most? Identification of suspicious users Correlations of suspected users with apps, databases and other sensitive services 1 2 3 456 Timeline of malicious behavior, e.g., sending emails or communicating with CnC Actions IP1 IP2 UID8 UID1 UID8 DB2
  23. 23. Caspida – Karthik Kannan Network traffic Behavioral threat models Network Traffic: PCAP, Netflow • Switches • Routers • Firewalls* • IPS’* • Web gateways* • Proxy server* • Any other network device* * optional Sources • Traffic monitoring & analysis: • Which IP is communicating with which external or internal destination • Traffic volume, frequency • Correlate with IAM (for user ID – IP mapping) • Max traffic contributors – users, apps, IP addresses • Correlate with Web server (for URL traffic analysis per user) • Correlate with Messaging server (for email source/recipient analysis) • Correlate with Firewall (for external traffic analysis per IP, user) • Correlate with App, DB servers (for internal app transactional analysis) • External threat (e.g., bad IP address list) feeds Threat Intelligence
  24. 24. Caspida – Karthik Kannan Examples  Ground-speed violations − detect user logins that are geographically spaced apart but fall within seconds/minutes of each other  Lateral movement − accounts moving from one server/device to another to explore and list content on each location before deciding which to exfiltrate  Domain admin creations − auto creation of admin accounts by spurious account; e.g., r00t, adm1n etc.
  25. 25. Caspida – Karthik Kannan Where do you start?  Need a data lake  Analytics: − ML − Statistical  Actions
  26. 26. Caspida – Karthik Kannan Thank you!

×