© 2013 IBM CorporationMay 14, 2013
Big Data for CyberSecurity
Anand Ranganathan
Research Staff Member, TJ Watson Research Center
<arangana@us.ibm.com>
Agenda
 Cyber Threats
 IBM Big Data Suite
 Big Data Analytics for CyberSecurity
– Monitor Network Behaviors to detect known and unknown cyber-threats
in Enterprises
– Detect Denial of Service Attacks in large ISPs
– Detect Data-Leakage from organizations
2IB
Cyber-Threats Are Becoming More Sophisticated
3
2011: Year of the Targeted Attack
Source: IBM X-Force®
Research 2011 Trend and Risk Report
JK2012-04-26
Marketing
Services
Online
Gaming
Online
Gaming
Online
Gaming
Online
Gaming
Central
Government
Gaming
Gaming
Internet
Services
Online
Gaming
Online
Gaming
Online
Services
Online
Gaming
IT
Security
Banking
IT
Security
Government
Consulting
IT
Security
Tele-
communic
ations
Enter-
tainment
Consumer
Electronics
Agriculture
Apparel
Insurance
Consulting
Consumer
Electronics
Internet
Services
Central
Govt
Central
Govt
Central
Govt
Attack Type
SQL Injection
URL Tampering
Spear Phishing
3rd
Party Software
DDoS
SecureID
Trojan Software
Unknown
Size of circle estimates relative impact of
breach in terms of cost to business
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Entertainment
Defense
Defense
Defense
Consumer
Electronics
Central
Government Central
Government
Central
Government
Central
Government
Central
Government
Central
Government
Central
Government
Consumer
Electronics
National
Police
National
Police
State
Police
State
Police
Police
Gaming
Financial
Market
Online
Services
Consulting
Defense
Heavy
Industry
Entertainment
Banking
2011 Sampling of Security Incidents by Attack Type, Time and Impact
conjecture of relative breach impact is based on publicly disclosed information regarding leaked records and financial losses
2012: The explosion of breaches continues!
Source: IBM X-Force®
Research 2012 Trend and Risk Report
2012 Sampling of Security Incidents by Attack Type, Time and Impact
Conjecture of relative breach impact is based on publicly disclosed information regarding leaked records and financial losses
A Denial of Service attack that prevents or impairs the use of networks,
systems, or applications by exhausting resources
Malware infection - A virus, worm, Trojan horse, or other code-based
malicious entity that successfully infects a host
A targeted, advanced attack – also known as an advanced persistent
threat (APT) - which is designed to be undetectable.
Loss or theft of technology (laptops, memory sticks, PDAs) which
contain sensitive data; Inadvertent disclosure of data
Defacement - A person gains logical or physical access without
permission and defaces a Web application
Common Cyber Security Risks and Potential Impacts
Loss of Customers
Impact to Brand
Sensitive Data Disclosure
Stolen Intellectual Property
Loss of Data & Productivity
Personal and National Security
Common Security Risks Potential Impacts
Loss of Data or Productivity
Botnets
 Botnet = A network of compromised computers controlled by
the botmaster, ranging in size from hundreds to millions of hosts
 Purpose: denial of service attacks, spam delivery, stealing
credentials and data, compromising control systems, etc.
 Hosts infected by downloads from malicious websites, emailed
executables, web, memory stick, PDF, …
 Bots receive updates and commands from the Command and
Control node and communications are becoming more
sophisticated
7
Botnet Communication
There is need to talk:
 Bots receive updates and
commands from the C&C
node
 Utilize a command and
control structure, through
IRC, HTML, SSL, Twitter, IM
or custom built solutions.
 Botnet communications are
becoming more
sophisticated and harder to
track
– peer-to-peer, distributed vs.
hierarchical control structure
– fast fluxing, name generation
8
C&C
P2P
A Typical Threat Example
9
2
Malicious Web
server sends or
reflects exploit code
<click>
1
Install Malware
Mail-Client
5
Victim
Domain
Name
Server
Spammer
Command
& Control
4 web-page +
3 Follow link
Execute (Spam..)
9
C&C
/ U
pdater IP
Address
Lookup
C
&C
/ U
pdater D
N
6
Remotely Control
Malware
Contact Updater
By IP Address (C&C)7
8
A Typical Threat Example
10
2
Malicious Web
server sends or
reflects exploit code
<click>
1
Install Malware
Mail-Client
5
Victim
Domain
Name
Server
Spammer
Command
& Control
4 web-page +
3 Follow link
Execute (Spam..)
9
C&C
/ U
pdater IP
Address
Lookup
C
&C
/ U
pdater D
N
6
Remotely Control
Malware
Contact Updater
By IP Address (C&C)7
8
d) Monitor Web Traffic
a) Monitor DNS
c) Monitor Port &
Protocol Usage
b) Monitor NetFlowb) Monitor NetFlow
Typical Solution Architecture
11
01/11/10
DNS
NetFlow
…..
X86
Box
X86
Blade
Cell
Blade
X86
Blade
FPGA
Blade
Operating System
TransportSystem S Data Fabric
Unsupervised Real-Time AnalyticsUnsupervised Real-Time Analytics Supervised LearningSupervised Learning
Dashboarding /
Visualization
1
3
2
Real-time Results
(Tickets, Monitoring)
Collect Results +
Evidence
Trends, History
4 Adapted Analytics Models
• Cybersecurity Analytics
• Real-Time processing
of massive data streams
• Advanced Data Mining,
and Trend analytics
• New and Incremental
model learning
PureData System for
Analytics, BigInsights
IBM Confidential © 2012 IBM Corporation12
Smarter Communications
BI /
Reporting
BI / Reporting Exploration /
Visualization
Functional
App
Industry
App
Predictive
Analytics
Content
Analytics
Analytic Applications
IBM Big Data Platform
Systems
Management
Application
Development
Visualization
& Discovery
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
IBM Big Data Suite
IBM Confidential © 2012 IBM Corporation13
IBM InfoSphere Streams
Millions
of
events
per
second
Microse
cond
Latency
Traditional / Non-
traditional
data sources
Real time delivery
Powerful
Analytics
Algo
Trading
Telco churn
predict
Smart
Grid
Cyber
Security
Government /
Law enforcement
ICU
Monitoring
Environment
Monitoring
A Platform for Real Time Analytics on BIG Data
Volume
Terabytes per second
Petabytes per day
Variety
All kinds of data
All kinds of analytics
Velocity Insights in microseconds
Agility
Dynamically responsive
Rapid application development
IBM Confidential © 2012 IBM Corporation14
 continuous ingestion  continuous analysis
How Streams Works
achieve scale by
partitioning applications into components
IBM Confidential © 2012 IBM Corporation15
 continuous ingestion
 continuous analysis
achieve scale
by partitioning applications into components
by distributing across stream-connected hardware nodes
How Streams Works
infrastructure provides services for
scheduling analytics across h/w nodes
establishing streaming connectivity
…
TransformTransform
FilterFilter
ClassifyClassify
CorrelateCorrelate
AnnotateAnnotate
where appropriate,
elements can be “fused” together
for lower communication latencies
Security Appliances (Firewalls, IDS, IPS, SIEMs)
vs Big Data
IBM Big Data PlatformIBM QRadar Security Intelligence Platform
Security use cases Turnkey Custom
User Interface All-in-one console Purpose-built applications
Data Sources 450+ preconfigured (and growing) Everything else
Data Volume 100+ Terabyte range Peta-byte range
Real-time Analysis Seconds Milliseconds
Analytics Pre-built, primarily rule-based Custom, learning
Required Expertise Average - Security practitioners Skilled – Data scientists and analysts
InfoSphere BigInsights,
Streams and PureData
for Analytics
Organizations have a growing need to identify and protect
against threats by building insights from broader and
larger data sets
A Typical Threat Example
20
2
Malicious Web
server sends or
reflects exploit code
<click>
1
Install Malware
Mail-Client
5
Victim
Domain
Name
Server
Spammer
Command
& Control
4 web-page +
3 Follow link
Execute (Spam..)
9
C&C
/ U
pdater IP
Address
Lookup
C
&C
/ U
pdater D
N
6
Remotely Control
Malware
Contact Updater
By IP Address (C&C)7
8
d) Monitor Web Traffic
a) Monitor DNS
c) Monitor Port &
Protocol Usage
b) Monitor NetFlowb) Monitor NetFlow
Traditional Security Analytics
21
Monitored
Network
Monitored
Network
The Rest
Of The
World
DNSDNSDNS
DHCPDHCP
Firewall
IDS/
IPS
Inline
Conventional
Setup
Detect Signatures
within Individual
Data Streams
Streaming Analytics
22
Monitored
Network
Monitored
Network
The Rest
Of The World
(Internet)
DNSDNSDNS
DHCPDHCP
Firewall
IDS/
IPS
Inline
Real-Time Streaming
Analytics Setup
Detect Signatures
within Individual
Data Streams
Real-Time
Cyber Security
Analytics
Detects behaviors by correlating
across diverse & massive data
streams via Analytics in Motion
Models learnt offline with
Analytics on Data at Rest
IDS/IPS Alerts…
Streaming Analytics for Fast-flux Botnets
23
DNS Response
Records
Suspected
Fast-flux
Domain
Names
JoinJoin
DNS Queries
(with internal querying host IP Addresses)
FastFlux
Analytics
FastFlux
Analytics
FastFlux
Analytics
FastFlux
Analytics
FastFlux
Analytics
FastFlux
Analytics
Candidate Names/IP's
with Confidence Values
AggregatorAggregator
Suspected
Fast-Flux
IP-addresses
JoinJoin
DHCP Traffic
(IP  MAC  System/Owner)
Fast-fluxing
Bot alerts
JoinJoin
Host LogsHost Logs
IPS AlertsIPS Alerts
…
Netflow
24
Use Case 2 - Detect Distributed Denial of Service Attacks in
ISPs
 DDOS attacks often launched by botnets to flood a target server
 Often use techniques to amplify the flooding
– E.g. DNS Amplification Attacks
 Very hard to detect and prevent in time
– Need to monitor 100s of Gbps
– Need to monitor millions of DNS requests per second
 Use InfoSphere Streams for running analytics for detecting DDOS
attacks
– Look for anomalies in DNS server requests
– Scale to internet level traffic rates
© 2013 IBM Corporation25
Use Case 3 - Detect Data-Leakage from organizations
 Determine what information employees (or bots) are sending out of
the company
– Look at the all information flowing out of the company to the outside world
– Determine if it contains any confidential or sensitive information
 Monitor what information employees (or bots) are seeing/accessing
– Determine if they are accessing sensitive information (even if they may have
the rights to access it)
– Determine if their access patterns are suddenly changing
• E.g. an employee that is suddenly accessing much more information than he (or
someone else in his role) typically accesses may want to sell this information outside
or leave the company
© 2013 IBM Corporation26
27
DNS Amplification Attack
Key characteristics: 1) Targeted attack victimizing hosts & servers 2) DNS service provider becomes a
participant and unavailable during attack 3) Attack attribution is hard
28
To delete

Big Data for CyberSecurity

  • 1.
    © 2013 IBMCorporationMay 14, 2013 Big Data for CyberSecurity Anand Ranganathan Research Staff Member, TJ Watson Research Center <arangana@us.ibm.com>
  • 2.
    Agenda  Cyber Threats IBM Big Data Suite  Big Data Analytics for CyberSecurity – Monitor Network Behaviors to detect known and unknown cyber-threats in Enterprises – Detect Denial of Service Attacks in large ISPs – Detect Data-Leakage from organizations 2IB
  • 3.
    Cyber-Threats Are BecomingMore Sophisticated 3
  • 4.
    2011: Year ofthe Targeted Attack Source: IBM X-Force® Research 2011 Trend and Risk Report JK2012-04-26 Marketing Services Online Gaming Online Gaming Online Gaming Online Gaming Central Government Gaming Gaming Internet Services Online Gaming Online Gaming Online Services Online Gaming IT Security Banking IT Security Government Consulting IT Security Tele- communic ations Enter- tainment Consumer Electronics Agriculture Apparel Insurance Consulting Consumer Electronics Internet Services Central Govt Central Govt Central Govt Attack Type SQL Injection URL Tampering Spear Phishing 3rd Party Software DDoS SecureID Trojan Software Unknown Size of circle estimates relative impact of breach in terms of cost to business Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Entertainment Defense Defense Defense Consumer Electronics Central Government Central Government Central Government Central Government Central Government Central Government Central Government Consumer Electronics National Police National Police State Police State Police Police Gaming Financial Market Online Services Consulting Defense Heavy Industry Entertainment Banking 2011 Sampling of Security Incidents by Attack Type, Time and Impact conjecture of relative breach impact is based on publicly disclosed information regarding leaked records and financial losses
  • 5.
    2012: The explosionof breaches continues! Source: IBM X-Force® Research 2012 Trend and Risk Report 2012 Sampling of Security Incidents by Attack Type, Time and Impact Conjecture of relative breach impact is based on publicly disclosed information regarding leaked records and financial losses
  • 6.
    A Denial ofService attack that prevents or impairs the use of networks, systems, or applications by exhausting resources Malware infection - A virus, worm, Trojan horse, or other code-based malicious entity that successfully infects a host A targeted, advanced attack – also known as an advanced persistent threat (APT) - which is designed to be undetectable. Loss or theft of technology (laptops, memory sticks, PDAs) which contain sensitive data; Inadvertent disclosure of data Defacement - A person gains logical or physical access without permission and defaces a Web application Common Cyber Security Risks and Potential Impacts Loss of Customers Impact to Brand Sensitive Data Disclosure Stolen Intellectual Property Loss of Data & Productivity Personal and National Security Common Security Risks Potential Impacts Loss of Data or Productivity
  • 7.
    Botnets  Botnet =A network of compromised computers controlled by the botmaster, ranging in size from hundreds to millions of hosts  Purpose: denial of service attacks, spam delivery, stealing credentials and data, compromising control systems, etc.  Hosts infected by downloads from malicious websites, emailed executables, web, memory stick, PDF, …  Bots receive updates and commands from the Command and Control node and communications are becoming more sophisticated 7
  • 8.
    Botnet Communication There isneed to talk:  Bots receive updates and commands from the C&C node  Utilize a command and control structure, through IRC, HTML, SSL, Twitter, IM or custom built solutions.  Botnet communications are becoming more sophisticated and harder to track – peer-to-peer, distributed vs. hierarchical control structure – fast fluxing, name generation 8 C&C P2P
  • 9.
    A Typical ThreatExample 9 2 Malicious Web server sends or reflects exploit code <click> 1 Install Malware Mail-Client 5 Victim Domain Name Server Spammer Command & Control 4 web-page + 3 Follow link Execute (Spam..) 9 C&C / U pdater IP Address Lookup C &C / U pdater D N 6 Remotely Control Malware Contact Updater By IP Address (C&C)7 8
  • 10.
    A Typical ThreatExample 10 2 Malicious Web server sends or reflects exploit code <click> 1 Install Malware Mail-Client 5 Victim Domain Name Server Spammer Command & Control 4 web-page + 3 Follow link Execute (Spam..) 9 C&C / U pdater IP Address Lookup C &C / U pdater D N 6 Remotely Control Malware Contact Updater By IP Address (C&C)7 8 d) Monitor Web Traffic a) Monitor DNS c) Monitor Port & Protocol Usage b) Monitor NetFlowb) Monitor NetFlow
  • 11.
    Typical Solution Architecture 11 01/11/10 DNS NetFlow ….. X86 Box X86 Blade Cell Blade X86 Blade FPGA Blade OperatingSystem TransportSystem S Data Fabric Unsupervised Real-Time AnalyticsUnsupervised Real-Time Analytics Supervised LearningSupervised Learning Dashboarding / Visualization 1 3 2 Real-time Results (Tickets, Monitoring) Collect Results + Evidence Trends, History 4 Adapted Analytics Models • Cybersecurity Analytics • Real-Time processing of massive data streams • Advanced Data Mining, and Trend analytics • New and Incremental model learning PureData System for Analytics, BigInsights
  • 12.
    IBM Confidential ©2012 IBM Corporation12 Smarter Communications BI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications IBM Big Data Platform Systems Management Application Development Visualization & Discovery Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse IBM Big Data Suite
  • 13.
    IBM Confidential ©2012 IBM Corporation13 IBM InfoSphere Streams Millions of events per second Microse cond Latency Traditional / Non- traditional data sources Real time delivery Powerful Analytics Algo Trading Telco churn predict Smart Grid Cyber Security Government / Law enforcement ICU Monitoring Environment Monitoring A Platform for Real Time Analytics on BIG Data Volume Terabytes per second Petabytes per day Variety All kinds of data All kinds of analytics Velocity Insights in microseconds Agility Dynamically responsive Rapid application development
  • 14.
    IBM Confidential ©2012 IBM Corporation14  continuous ingestion  continuous analysis How Streams Works achieve scale by partitioning applications into components
  • 15.
    IBM Confidential ©2012 IBM Corporation15  continuous ingestion  continuous analysis achieve scale by partitioning applications into components by distributing across stream-connected hardware nodes How Streams Works infrastructure provides services for scheduling analytics across h/w nodes establishing streaming connectivity … TransformTransform FilterFilter ClassifyClassify CorrelateCorrelate AnnotateAnnotate where appropriate, elements can be “fused” together for lower communication latencies
  • 16.
    Security Appliances (Firewalls,IDS, IPS, SIEMs) vs Big Data IBM Big Data PlatformIBM QRadar Security Intelligence Platform Security use cases Turnkey Custom User Interface All-in-one console Purpose-built applications Data Sources 450+ preconfigured (and growing) Everything else Data Volume 100+ Terabyte range Peta-byte range Real-time Analysis Seconds Milliseconds Analytics Pre-built, primarily rule-based Custom, learning Required Expertise Average - Security practitioners Skilled – Data scientists and analysts InfoSphere BigInsights, Streams and PureData for Analytics
  • 17.
    Organizations have agrowing need to identify and protect against threats by building insights from broader and larger data sets
  • 18.
    A Typical ThreatExample 20 2 Malicious Web server sends or reflects exploit code <click> 1 Install Malware Mail-Client 5 Victim Domain Name Server Spammer Command & Control 4 web-page + 3 Follow link Execute (Spam..) 9 C&C / U pdater IP Address Lookup C &C / U pdater D N 6 Remotely Control Malware Contact Updater By IP Address (C&C)7 8 d) Monitor Web Traffic a) Monitor DNS c) Monitor Port & Protocol Usage b) Monitor NetFlowb) Monitor NetFlow
  • 19.
    Traditional Security Analytics 21 Monitored Network Monitored Network TheRest Of The World DNSDNSDNS DHCPDHCP Firewall IDS/ IPS Inline Conventional Setup Detect Signatures within Individual Data Streams
  • 20.
    Streaming Analytics 22 Monitored Network Monitored Network The Rest OfThe World (Internet) DNSDNSDNS DHCPDHCP Firewall IDS/ IPS Inline Real-Time Streaming Analytics Setup Detect Signatures within Individual Data Streams Real-Time Cyber Security Analytics Detects behaviors by correlating across diverse & massive data streams via Analytics in Motion Models learnt offline with Analytics on Data at Rest IDS/IPS Alerts…
  • 21.
    Streaming Analytics forFast-flux Botnets 23 DNS Response Records Suspected Fast-flux Domain Names JoinJoin DNS Queries (with internal querying host IP Addresses) FastFlux Analytics FastFlux Analytics FastFlux Analytics FastFlux Analytics FastFlux Analytics FastFlux Analytics Candidate Names/IP's with Confidence Values AggregatorAggregator Suspected Fast-Flux IP-addresses JoinJoin DHCP Traffic (IP  MAC  System/Owner) Fast-fluxing Bot alerts JoinJoin Host LogsHost Logs IPS AlertsIPS Alerts … Netflow
  • 22.
  • 23.
    Use Case 2- Detect Distributed Denial of Service Attacks in ISPs  DDOS attacks often launched by botnets to flood a target server  Often use techniques to amplify the flooding – E.g. DNS Amplification Attacks  Very hard to detect and prevent in time – Need to monitor 100s of Gbps – Need to monitor millions of DNS requests per second  Use InfoSphere Streams for running analytics for detecting DDOS attacks – Look for anomalies in DNS server requests – Scale to internet level traffic rates © 2013 IBM Corporation25
  • 24.
    Use Case 3- Detect Data-Leakage from organizations  Determine what information employees (or bots) are sending out of the company – Look at the all information flowing out of the company to the outside world – Determine if it contains any confidential or sensitive information  Monitor what information employees (or bots) are seeing/accessing – Determine if they are accessing sensitive information (even if they may have the rights to access it) – Determine if their access patterns are suddenly changing • E.g. an employee that is suddenly accessing much more information than he (or someone else in his role) typically accesses may want to sell this information outside or leave the company © 2013 IBM Corporation26
  • 25.
  • 26.
    DNS Amplification Attack Keycharacteristics: 1) Targeted attack victimizing hosts & servers 2) DNS service provider becomes a participant and unavailable during attack 3) Attack attribution is hard 28 To delete

Editor's Notes

  • #5 This slide shows you sort of a timeline of events during the first half of 2011. A bunch of different attacks against major organizations, many of whom we feel are probably pretty operationally competent. These are not surprising that some of these organizations were breached. Also, we sort of relate the attack vector as best we understand it based on what ’s been publicly disclosed. And we also - we sort of have a conjecture about the impact of the breach from a financial standpoint, and that’s a rough estimate based on what’s been publicly disclosed. So those numbers are certainly not to be bet on or anything. But it’s as good as we can do based on what we know.
  • #6 Open Security Foundation reported 40% increase in breach events for 2012 that cover loss, theft, and exposure of personally identifiable information
  • #8 There is need to talk: Bots receive updates and commands from the C&amp;C node Utilize a command and control structure, through IRC, HTML, SSL, Twitter, IM or custom built solutions. Botnet communications are becoming more sophisticated and harder to track peer-to-peer, distributed vs. hierarchical control structure fast fluxing, name generation
  • #13 Key Points - Integrate v3 – the point is to have one platform to manage all of the data – there’s no point in having separate silos of data, each creating separate silos of insight. From the customer POV (a solution POV) big data has to be bigger than just one technology Analyze v3 – very important point – we see big data as a viable place to analyze and store data. New technology is not just a pre-processor to get data into a structured DW for analysis. Significant area of value add by IBM – and the game has changed – unlike DBs/SQL, the market is asking who gets the better answer and therefore sophistication and accuracy of the analytics matters Visualization – need to bring big data to the users – spreadsheet metaphor is the key to doing son Development – need sophisticated development tools for the engines and across them to enable the market to develop analytic applications Workload optimization – improvements upon open source for efficient processing and storage Security and Governance – many are rushing into big data like the wild west. But there is sensitive data that needs to be protected, retention policies need to be determined – all of the maturity of governance for the structured world can benefit the big data world
  • #20 IBM IOD 2011 05/14/13 Prensenter name here.ppt
  • #24 What we are monitoring: &gt; 12.000 Systems, we have about 12.000 unique MAC addresses in our db and we can only get to MAC addresses for a part of the systems we monitor (mostly systems using DHCP) since we do not yet connect to infrastructure that assign fixed IP addresses. We added ARP monitoring to correlate static IP addresses with their MAC addresses but see only partially the ARP traffic since the taps are located at the network boundaries. We track about 200.000-600.000 unique domain names per day, 20K to 120K unique domain names per hour, just to give you an idea.