Slideshare.net (beta)

 
Post: 
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons

All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 2 (more)

Log Mining: Beyond Log Analysis

From anton_chuvakin, 7 months ago

The presentation will describe methods for discovering interesting more

3828 views  |  1 comment  |  2 favorites  |  3 embeds (Stats)
 

Groups/Events

Not added to any group/event

 
 

Privacy InfoNew!

This slideshow is Public

 
Embed in your blog
Embed (wordpress.com)

Slideshow Statistics
Total Views: 3828
on Slideshare: 3824
from embeds: 4* * Views from embeds since 21 Aug, 07

Slideshow transcript

Slide 1: Se curity Lo g Mining Beyond Log Analysis Anto n Chuvakin, Ph.D., GCIA, GCIH, GCFA Security Log Mining Last presented on March 9, 2007 IT Underground Prague, Czech Republic

Slide 2: Goals • Learn or refresh your knowledge about log analysis for security • Learn about novel techniques of log analysis via data mining • Get you to think of using them in your environment

Slide 3: Outline: Log Mining (LM) • Logs and Log Analysis Overview – What logs?  – Why analyze logs? – Why NOT analyze logs?  – How people usually do it • Log Mining – Knowledge discovery and data mining brief – Mining of different types of logs • Results – Examples of using the above methods • Tools – How one can built tools to do it

Slide 4: Definitions • Log = record related to whatever activities occurring on an information system • Also: alert, “event”, alarm, message, record, etc …standard definitions are coming soon!.

Slide 5: Log Analysis: What Log Data Sources Log Analysis Process – IDS – Generate – Firewalls/IPS – Collect – Anti-malware – Aggregate – Proxies – Normalize – Network – Alert infrastructure – Store – Servers – Summarize, baseline – Databases – Make conclusions – Applications – Act on them!

Slide 6: Log Analysis: Why • Situational awareness and new threat discovery – Unique perspective from combined logs • Getting more value out of the network and security infrastructures – Get more that you paid for! • Extracting what is really actionable automatically • Measuring security (metrics, trends, etc) • Compliance and regulations (oh, my!) • Incident response (last, but not least!)

Slide 7: Log Analysis: Why NOT or Log Analysis Challenges • “Real hackers don’t get logged!”  • Why bother? No, really … • Too much data (>X0 GB per day) • Too hard to do • No tools “that do it for you” – Or: tools too expensive • What logs? We turned them off 

Slide 8: Log Analysis Basics: How Common approaches to the “log problem”: • Manual – ‘Tail’, ‘more’, etc • Filtering – Positive and negative (“Artificial ignorance”) • Summarization and reports • Simple visualization – “…worth a thousand words?” • Correlation – Rule-based and other

Slide 9: Log Analysis Basics: When Timing requirements for analysis • Real-time fallacy: “we have to have it when?”  – “A day later vs never” question • Would you rather catch an intrusion a day after … or a month after … CNN talks about it  – Daily in-depth analysis • Log management vs alert management: different challenges – When filtering and event correlation is not enough • Some data just doesn’t mean much in real-time

Slide 10: KDD and DM • Introducing data mining.. • Definitions and background terms: – Data Mining (DM) and Knowledge Discovery in Database (KDD) • DM = “Extraction of interesting (non- trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data”

Slide 11: Brief on Some DM Techniques From DM to LM: • Deviation analysis – Baselines and deviations • Classification – Organize data by class to know it • Clustering – How things are grouped together • Association Rule Discovery – Relationship finding • Outlier Detection – What stands out

Slide 12: KDL and LM • Log Mining (LM) and Knowledge Discovery in Logs (KDL) • Is “log mining” a marketing buzzword? Not yet! • Why “mine the logs”? – New types of analysis – More human-like pattern recognition – Prediction? Probably not! – Dealing with sparse data • Towards “replacing” humans (not really…) – Offloading conclusion generation to machines – “Better than junior analysts”

Slide 13: Preliminary Requirements Mostly the same as for simpler log analysis, but with some added factors: • Centralized – To look in just one place • Normalized – To look across the data sources • Quick accessible storage – To be used by the mining tools

Slide 14: Log Data from DM Perspective Common fields in logs: • Time • Source • Destination • Protocol • Port(s) • User name • Event/attack type • Bytes exchanged

Slide 15: Log Data from DM Perspective But are logs really data?  Looks like /broken / English to me… %PIX-2-214001: Terminating manager session from 146.127.55.2 on interface inside. Reason: incoming encrypted data (18998 bytes) longer than 12453 bytes %PIX-3-109016: Downloaded authorization access-list 101 not found for user sunilp Text mining techniques might also come handy

Slide 16: Example: Jumbled Mess of SAP Application Logs |22:01:40|BTC| 7|000|DDIC | |LC2|Systemerror when executing external command DB6_DATA_COLLECTOR on gneisenau () |22:02:32|BTC| 7|000|DDIC | |R49|Communication error, CPIC return code 020, SAP return code 456 |22:02:32|BTC| 7|000|DDIC | |R5A|> Conversation ID: 38910614 |22:02:32|BTC| 7|000|DDIC | |R64|> CPI-C function: CMSEND(SAP) |22:02:32|BTC| 7|000|DDIC | |LC2|Systemerror when

Slide 17: What Do We “Mine” for? • How about for something interesting? • One research paper defines “interesting” thus: – Unexpected to user (aka not “normal”, not routine) – Actionable (we can and/or should do something about it) • Examples: – Compromised/infected system – Successful attack – Insider abuse and data theft – Other data leaks, intentional and not – Covert channel/hidden backdoor communication – Increase in probing – Mysterious system crash

Slide 18: Simple Example • Too many attack types from a single IP address • Right next to known vulnerability scanners • External IP address • Conclusion: potentially dangerous attacker

Slide 19: Deeper into interesting - I Approaches to finding interesting stuff in logs without knowing what we look for specifically: • Rare things – Is compromise rare in your environment?  • Different things – Is today “just another day” … or not? • “Out of character” things – It always does it… but not today? • Weird-looking things

Slide 20: Example 1: Can You Guess What Happened?! Destination Port 1D Baseline

Slide 21: Example 2: Can You Guess What Happened?!

Slide 22: Example 3: Can You Guess What Happened?!

Slide 23: Deeper into interesting - II • Things goings in the unusual direction – Your web server is now a web client - to “hack.kz”  • Top things and Bottom Things – And them changing places! • Strange combinations of uninteresting things – A nice connection to a web server – A nice configuration change – A nice user creation – SO, is it NICE?

Slide 24: Example 4: Can You Guess What Happened?!

Slide 25: Example 5: Can You Guess What Happened?!

Slide 26: Deeper into interesting - III • Counts of an otherwise uninteresting thing – Pings? Connections to port 80? Error 404s? • Ratios of otherwise uninteresting things – Login failures / login successes? – Inbound / outbound connections? • Frequencies of things – Frequent becoming rare – and vice versa! • Time series behaving badly – Traffic overall grows, but traffic vs system X slows

Slide 27: Example 6: Can You Guess What Happened?!

Slide 28: More Examples • Structure of examples: – What was discovered? – What really happened? – How we discovered “the truth”? • All examples are from the tools prototyped and tested by the author … – Deviations and snapshot comparisons for firewall traffic – Scan detection from firewall data – Event rarity across system logs – “Rich” event sequences in mixed logs – Ratio analysis for logins and status codes – Pattern recognition and rule mining – Local to global trend comparisons in logs

Slide 29: Simple Example Revisited

Slide 30: Example 7: Can You Guess What Happened?!

Slide 31: Example 8: Fun Port Metrics

Slide 32: Example 9: Can You Guess What Happened?!

Slide 33: Real-life Usage • A busy analyst comes in the morning…gets coffee • Remembers that he needs to monitor security in addition to 1,576,903 other tasks  • Looks at a combination report showing “What is Interesting Today?” • Investigates some of the items, takes action, etc • Tells the system not to bother him with the rest in the future • Goes for more coffee and drowns in the sea of other tasks 

Slide 34: How YOU can do it - I? • First, collect logs and events – Syslog-NG to some SQL – AANVAL, OSSIM – ACID/BASE – Syslog2SQL – Custom log-to-SQL system (not that hard) – Whatever SQL log and event store (commercial, open-source, home-grown)

Slide 35: How YOU can do it - II? • Second, plan what to baseline – Network • Port access, system access, protocols, event types – System • Login/logout success/failure, process starts, configuration changes – Application or database • Data access type, user, data changes, client, etc

Slide 36: How YOU can do it - III? • Third, script the analysis techniques you liked – Perl with SQL access modules – Python – a new fave of those who know ! – PHP

Slide 37: How YOU can do it - IV? • Fourth, act on the results – Mitigate, block, disable, fire , slice-n-dice  • Fifth, automate as needed – More data, more tools, more results…

Slide 38: Conclusion LM and KDL is… • …cool and new way of looking at log data • …actually works • …can help where common analysis methods fail • …not that hard  • …can be done over different kind of data: database logs, application logs, etc

Slide 39: Take These Home with You!! • Look at your logs! You’d be happy you started now and not tomorrow (*) • Simple analysis is incredibly useful, but it only goes so far • “Complicated” analysis really isn’t that complicated and can be done “on the cheap”

Slide 40: Thank You for Coming!

Slide 41: Feedback? Q&A? Anton Chuvakin, Ph.D., GCIA, GCIH, GCFA Chief Logging Evangelist LogLogic, Inc anton@chuvakin.org http://www.chuvakin.org See www.info-secure.org for my papers, books, reviews and other security resources; www.securitywarrior.org for “Security Warrior” book (2004)