This document discusses using machine data for security insights through big data analytics. It begins by explaining the importance of understanding the types of machine data, such as firewall, IPS, and Windows logs, and the context around that data. It then discusses where insights can come from, such as anomalies, trends, and deficiencies. Measures for detecting common attacks are proposed. The document outlines a security analysis life cycle of detecting incidents, verifying them, and reacting. It concludes by providing examples of implementing such a system using various open source and commercial tools.
2. WhoAmI
● Lazy Blogger
– Japan, Security, FOSS, Politics, Christian
– http://narudomr.blogspot.com
● 5 Years In Log Analysis
● Consultant, OWASP Thailand Chapter
● Head of IT Security, Kiatnakin Bank PLC (KKP)
● narudom.roongsiriwong@owasp.org
3. Objective
● Lay foundation of Big Data analytics using
information security scenarios for example
● State the practical analytics from my
experience
● Show how to acquire each component to fulfill
operational requirement.
4. Agenda
● Know Your Machine Data
● Know Your Context
● Look for Insight
● Identify Measure
● Security Analysis Life Cycle
● Implementation
6. Know Your Machine Data
● Types of Data
● Information from Each Data Type
● Size of Data
– Bytes per Event
– Numbers of Events per Second, Minute, Hour, Day,
Month
– Percentage of Each Data Type Compared to Total
Data Size
Time Series
7. Know Your Machine Data: Firewall
● Types of Data
– Access Control Log (Accepted/Denied Log)
– Administrative Activity Log
– System Status Log
– Other Next Generation Firewall Logs; IDS, SIP,
Connection Built/Teardown
● Information from Each Type of Data
– Access Control Log: Start Time, Action, Source
IP/Port, Destination IP/Port, Protocol, etc.
– Administrative Activity Log: Time, User, Action,
Result, etc.
8. Cisco ASA: Built/Teardown Log
Apr 29 2013 12:59:50: %ASA-6-305011: Built dynamic TCP translation from
inside:X.X.3.42/4952 to outside:X.X.X.130/12834
Apr 29 2013 12:59:50: %ASA-6-302013: Built outbound TCP connection 89743274 for
outside:X.X.X.43/443 (X.X.X.43/443) to inside:X.X.3.42/4952 (X.X.X.130/12834)
Apr 29 2013 12:59:50: %ASA-6-305011: Built dynamic UDP translation from
inside:X.X.1.35/52925 to outside:X.X.X.130/25882
Apr 29 2013 12:59:50: %ASA-6-302015: Built outbound UDP connection 89743275 for
outside:X.X.X.222/53 (X.X.X.222/53) to inside:X.X.1.35/52925 (X.X.X.130/25882)
Apr 29 2013 12:59:50: %ASA-6-305012: Teardown dynamic UDP translation from
inside:X.X.1.24/63322 to outside:X.X.X.130/59309 duration 0:00:30
Apr 29 2013 12:59:50: %ASA-6-305011: Built dynamic TCP translation from
inside:X.X.3.42/4953 to outside:X.X.X.130/45392
Apr 29 2013 12:59:50: %ASA-6-302013: Built outbound TCP connection 89743276 for
outside:X.X.X.1/80 (X.X.X.1/80) to inside:X.X.3.42/4953 (X.X.X.130/45392)
Apr 29 2013 12:59:50: %ASA-6-302016: Teardown UDP connection 89743275 for
outside:X.X.X.222/53 to inside:X.X.1.35/52925 duration 0:00:00 bytes 140
Apr 29 2013 12:59:50: %ASA-6-305011: Built dynamic TCP translation from
inside:X.X.3.42/4954 to outside:X.X.X.130/10879
Apr 29 2013 12:59:50: %ASA-6-302013: Built outbound TCP connection 89743277 for
outside:X.X.X.17/80 (X.X.X.17/80) to inside:X.X.3.42/4954 (X.X.X.130/10879)
9. Cisco ASA: Access Log Intelligence
Translate IP Address to Domain User
10. Know Your Machine Data: IPS/IDS
● Type of Data
– IPS Event: Blocked, Alert
– Packet Acquisition (PCAP)
– Contextual Information (Intelligence)
– System Status Log
● Information from Each Type of Data
– IPS Event: Source IP/Port, Destination IP/Port,
Name of Matched Rule, etc.
– Packet Acquisition: Raw Data or Payload
– Contextual Information: IP to Domain, IP to User,
Application Detection, etc.
11. Cisco FirePower (SourceFire): eStreamer
● The Cisco Event Streamer (also known as eStreamer) allows you to stream
FireSIGHT System intrusion, discovery, and connection data from the Cisco
Defense Center or managed device to external client applications.
●
Provides more intelligent information than IPS/IDS alert logs.
12. Know Your Machine Data: Windows
● Type of Data
– Security
– System
– Application
● Information from Each Type of Data
– Time Generated, Time Written, Event ID, Event
Type, User, Computer, Keyword
– Windows Server 2003 vs 2008 Event ID's
– EVT vs EVTX
13. Know Your Machine Data: Web Server
● Type of Data
– Access Log
– Error Log
● Information from Each Type of Data
– Access Log: Client IP, User ID, Finished Time,
Request Method, URL, HTTP Version, Status Code,
Returned Size
– Error Log: Time, Log Level, Client IP Address, Error
Message
16. What Is Context?
Context is the information surrounding the
information. Without context, information can be
misinterpreted.
● Context may be information of your
environment.
● Information of context is normally constant,
rarely changed.
22. What is Insight?
The capacity to gain an accurate and deep intuitive
understanding of a person or thing
23. Where Does Insight Comes From?
● The best insights tend to come from sources that
can be categorized
● Insight Channels
– Anomalies: Deviations from the norm
– Confluence: Macro trend intersection
– Frustrations: Deficiencies in the system
– Orthodoxies: Question conventional beliefs
– Extremities: Learn from the behaviors of leading or
laggard actors
– Voyages: Learn how your stakeholders live, work, and
behave
– Analogies: Borrow from other industries or organizations
Harvard Business Review, November 2014 Issue
28. Extremities
Learn from the behaviors of leading or laggard actors
Analyze Traffic from Russia
How about the missing actors?
29. Voyages
Learn how your stakeholders live, work, and behave
Sometimes it is hard to figure out why data set seems
strange until you see what are going on the fields.
30. Analogies
Borrow from other industries or organizations
● Knowledge from the others
– Other Industries
– Other organization in the same industry
● Forms of Knowledge
– Standard
– Best Practice
– Security Pattern : A packaged reusable solution to a
recurrent problem which embody the experiences
and knowledge of many security designers.
– Analysis or Research Papers
– Methodologies or Algorithms
34. ● Objective – Collecting as much information about
the target
– DNS Servers
– IP Ranges
– Administrative Contacts
– Problems revealed by administrators
● Methods
– Gather information from Search engines, forums,
internet databases (whois, ripe, arin, apnic)
– Use tools – PING, whois, Traceroute, DIG, nslookup,
sam spade
● No log source affected
Reconnaissance (Foot Printing)
35. ● Objective
– Specific targets determined
– Identification of Services / open ports
– Operating System Enumeration
● Methods
– Banner grabbing
– Responses to various protocol (ICMP &TCP)
commands
– Port / Service Scans – TCP Connect, TCP SYN, TCP
FIN, etc.
– Tools – Nmap, FScan, Hping, Firewalk, netcat,
tcpdump, ssh, telnet, SNMP Scanner
Enumeration & Fingerprinting
37. ●
Objective: Finding target vulnerabilities
– Insecure Configuration
– Weak passwords
– Unpatched vulnerabilities in services, Operating systems, applications
– Possible Vulnerabilities in Services, Operating Systems
– Insecure programming
– Weak Access Control
● Methods
– Unpatched / Possible Vulnerabilities – Tools, Vulnerability information
Websites
– Weak Passwords – Default Passwords, Brute force, Social Engineering,
Listening to Traffic
– Insecure Programming – SQL Injection, Listening to Traffic
– Weak Access Control – Using the Application Logic, SQL Injection
Identification of Vulnerabilities
38. Identification Detection
● Primary log sources
– IPS/IDS alert logs
– OS security logs
– Web server access logs
● Secondary log sources
– Host-Based IDS
– Web Application Firewall
– Database Firewall
39. Attack – Exploit the Vulnerabilities
● Network Infrastructure Attacks
– Exploit network equipment
– Weaknesses in TCP / IP, NetBIOS
– Flooding the network to cause DOS
● Operating System Attacks
– Attacking Authentication Systems
– Exploiting Protocol Implementations
– Exploiting Insecure configuration
– Breaking File-System Security
● Application Specific Attacks
– Exploiting implementations of HTTP, SMTP protocols
– Gaining access to application Databases
– SQL Injection
– Spamming
40. Attack Detection
● Network Infrastructure Attacks
– Firewall logs: access, administration and system status
– IPS/IDS logs: alert and system status
● Operating System Attacks
– IPS/IDS alert logs
– OS security logs
– Special Security S/W logs – Host-Based IDS
● Application Specific Attacks
– Web server logs – access and error
– IPS/IDS alert logs
– Special Security Device & S/W logs – Host-Based IDS,
Web Application Firewall, Database Firewall
41. ● After exploitation success, attempt to access
the target
● Techniques
– Password eavesdropping
– File share brute forcing
– Password file grab
– Buffer overflows
Gaining Access
42. Gaining Access Detection
Technique Detection form Log Sources
Password eavesdropping None
Buffer overflows
File share brute forcing ●OS file audit logs (not installed by
default, Linux's auditd for example)
●Special Security S/W logs – Host-
Based IDSPassword file grab
43. ● If only user-level access was obtained in the
last step, the attacker will now seek to gain
complete control of the system
● Techniques
– Password cracking
– Known exploits
● Detection: Privilege User Creation or Login
– OS security logs
Escalating Privileges
44. ● Objective: After successful compromising,
hiding this fact from system administrators.
● Techniques
– Clear logs
– Hide tools
● Detection: Log service stop, log file deleted or
unauthorized change
– OS security logs***
– Special Security S/W logs – Host-Based IDS
Covering Tracks
45. ● Objective: Ensure that privileged access is easily
regained.
● Techniques
– Create rogue user accounts
– Schedule batch jobs
– Infect startup files
– Plant remote control services
– Install monitoring mechanisms
– Replace apps with Trojans
● Detection
– OS security logs***
– OS file audit logs***
– Special Security S/W logs – Host-Based IDS
Creating Back Doors
46. Measure for Host Scanning*
● Context
– We have firewall separated Internet and internal
network
– We have IP network x.x.x.x/26 (64 IP)
● Attack Pattern
– Attackers use one source IP try to connect to many
destination IP from the Internet.
● Possible Measure
– Found accepted/denied access control log from
Firewall with one source IP to many IP addresses >
20 IP addresses in one minutes
*For example only, the most effective way is to implement the IDS
probing firewall's interface connected to the Internet
47. Measure for Port Scanning*
● Context
– We have firewall separated Internet and internal
network
● Attack Pattern
– Attackers use one source IP try to connect to one
destination IP on various ports from the Internet.
● Possible Measure
– Found accepted/denied access control log from
Firewall with one source IP to one IP address on
different ports > 20 ports in one minutes
*For example only, the most effective way is to implement the IDS
probing firewall's interface connected to the Internet
48. Measure for Centralized HTTP Botnet
HTTP
C&C
Server
Bot
Master
Botnet
Botnet
Check for new command
Check for new command
Command
Command
49. Measure for Centralized HTTP Botnet
● Context
– We have firewall separated Internet and internal
network, outbound only port 80 & 443
● Attack Pattern
– The bots connect to them periodically to get new
commands from the bot master.
– The instructions of the bots tend to be short. The
lengths of command packets are typically small size of
1KB or even less
● Possible Measure
– Found accepted log from Firewall to one destination IP
address with byte-in size less than 1K for 3 or more
events per hour.
55. Implementation
● E = Event Generator
● C = Collection
● D = Data Storage
with Indexes
● A = Analysis Tool
● K = Knowledge Base
● R = Reaction &
Reporting
D
C C
E E E E E
A K
R
56. Event Generator
● Sensor
– IDS
– Any system providing
logs
– Agents
● Poller
– SNMP
D
C C
E E E E E
A K
R
57. Collection + Data Storage with Indexes
● Collection
– Gather information
from different sensors
– Filter
– Parse useful
information (tag or
normalize)
– Aggregate
● Data Storage with
Indexes
– Store raw or formatted
data with index
D
C C
E E E E E
A K
R
58. Analysis + Knowledge Base
● Analysis
– Analyze events stored in
data storage
– Correlation algorithms,
false-positive message
detection, mathematical
representation
● Knowledge Base
– Context Information
– Intrusion Path
– System Model
– Security Policy
D
C C
E E E E E
A K
R
59. Reaction and Reporting
● Subjective Concept
– Dashboard
– Report
– Security Policy
Enforcement Strategy
– Legal Constraints
– Contractual SLAs
D
C C
E E E E E
A K
R
60. Solution#1
Component Implementation
Collection SYSLOG Daemon
Bash script with grep+sed+awk
Data Storage with Indexes CSV Files
Analysis Tool Microsoft Excel
Knowledge Base Microsoft Excel
Reaction & Reporting Microsoft Excel
● The Good: Low Cost
● The Bad: Automation only for collection
● The Ugly: Analysis once a day
61. Solution#2
Component Implementation
Collection Windows Service (In-house)
Data Storage with Indexes MS SQL
Analysis Tool Windows Client Application (In-house)
Knowledge Base MS SQL
Reaction & Reporting Windows Client Application (In-house)
● The Good: Built-in security surveillance process
● The Bad: Unable to handle more than 1 GB/day and lost
some information from normalization
● The Ugly: Searching specific event using grep on raw data
faster than from database 10 times or more
62. Solution#3
Component Implementation
Collection Splunk Forwarder
Data Storage with Indexes Splunk Indexer
Analysis Tool Splunk Search Head
Knowledge Base Splunk Built in tables, RDBMS in the
future
Reaction & Reporting Splunk Search Head
● The Good: Scalable
● The Bad: Expensive!!!
63. Useful Skills
● Data Interpretation
– Network, System, Application
– Information Security Knowledge
● Search Skill
● Regular Expression
Editor's Notes
Tell about source code uploaded to BitBucket.
Case: Share printer as Administrator on a domain member windows client.