Slideshow transcript
Slide 1: Se curity Lo g Mining Beyond Log Analysis Anto n Chuvakin, Ph.D., GCIA, GCIH, GCFA Security Log Mining Last presented on March 9, 2007 IT Underground Prague, Czech Republic
Slide 2: Goals • Learn or refresh your knowledge about log analysis for security • Learn about novel techniques of log analysis via data mining • Get you to think of using them in your environment
Slide 3: Outline: Log Mining (LM) • Logs and Log Analysis Overview – What logs? – Why analyze logs? – Why NOT analyze logs? – How people usually do it • Log Mining – Knowledge discovery and data mining brief – Mining of different types of logs • Results – Examples of using the above methods • Tools – How one can built tools to do it
Slide 4: Definitions • Log = record related to whatever activities occurring on an information system • Also: alert, “event”, alarm, message, record, etc …standard definitions are coming soon!.
Slide 5: Log Analysis: What Log Data Sources Log Analysis Process – IDS – Generate – Firewalls/IPS – Collect – Anti-malware – Aggregate – Proxies – Normalize – Network – Alert infrastructure – Store – Servers – Summarize, baseline – Databases – Make conclusions – Applications – Act on them!
Slide 6: Log Analysis: Why • Situational awareness and new threat discovery – Unique perspective from combined logs • Getting more value out of the network and security infrastructures – Get more that you paid for! • Extracting what is really actionable automatically • Measuring security (metrics, trends, etc) • Compliance and regulations (oh, my!) • Incident response (last, but not least!)
Slide 7: Log Analysis: Why NOT or Log Analysis Challenges • “Real hackers don’t get logged!” • Why bother? No, really … • Too much data (>X0 GB per day) • Too hard to do • No tools “that do it for you” – Or: tools too expensive • What logs? We turned them off
Slide 8: Log Analysis Basics: How Common approaches to the “log problem”: • Manual – ‘Tail’, ‘more’, etc • Filtering – Positive and negative (“Artificial ignorance”) • Summarization and reports • Simple visualization – “…worth a thousand words?” • Correlation – Rule-based and other
Slide 9: Log Analysis Basics: When Timing requirements for analysis • Real-time fallacy: “we have to have it when?” – “A day later vs never” question • Would you rather catch an intrusion a day after … or a month after … CNN talks about it – Daily in-depth analysis • Log management vs alert management: different challenges – When filtering and event correlation is not enough • Some data just doesn’t mean much in real-time
Slide 10: KDD and DM • Introducing data mining.. • Definitions and background terms: – Data Mining (DM) and Knowledge Discovery in Database (KDD) • DM = “Extraction of interesting (non- trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data”
Slide 11: Brief on Some DM Techniques From DM to LM: • Deviation analysis – Baselines and deviations • Classification – Organize data by class to know it • Clustering – How things are grouped together • Association Rule Discovery – Relationship finding • Outlier Detection – What stands out
Slide 12: KDL and LM • Log Mining (LM) and Knowledge Discovery in Logs (KDL) • Is “log mining” a marketing buzzword? Not yet! • Why “mine the logs”? – New types of analysis – More human-like pattern recognition – Prediction? Probably not! – Dealing with sparse data • Towards “replacing” humans (not really…) – Offloading conclusion generation to machines – “Better than junior analysts”
Slide 13: Preliminary Requirements Mostly the same as for simpler log analysis, but with some added factors: • Centralized – To look in just one place • Normalized – To look across the data sources • Quick accessible storage – To be used by the mining tools
Slide 14: Log Data from DM Perspective Common fields in logs: • Time • Source • Destination • Protocol • Port(s) • User name • Event/attack type • Bytes exchanged
Slide 15: Log Data from DM Perspective But are logs really data? Looks like /broken / English to me… %PIX-2-214001: Terminating manager session from 146.127.55.2 on interface inside. Reason: incoming encrypted data (18998 bytes) longer than 12453 bytes %PIX-3-109016: Downloaded authorization access-list 101 not found for user sunilp Text mining techniques might also come handy
Slide 16: Example: Jumbled Mess of SAP Application Logs |22:01:40|BTC| 7|000|DDIC | |LC2|Systemerror when executing external command DB6_DATA_COLLECTOR on gneisenau () |22:02:32|BTC| 7|000|DDIC | |R49|Communication error, CPIC return code 020, SAP return code 456 |22:02:32|BTC| 7|000|DDIC | |R5A|> Conversation ID: 38910614 |22:02:32|BTC| 7|000|DDIC | |R64|> CPI-C function: CMSEND(SAP) |22:02:32|BTC| 7|000|DDIC | |LC2|Systemerror when
Slide 17: What Do We “Mine” for? • How about for something interesting? • One research paper defines “interesting” thus: – Unexpected to user (aka not “normal”, not routine) – Actionable (we can and/or should do something about it) • Examples: – Compromised/infected system – Successful attack – Insider abuse and data theft – Other data leaks, intentional and not – Covert channel/hidden backdoor communication – Increase in probing – Mysterious system crash
Slide 18: Simple Example • Too many attack types from a single IP address • Right next to known vulnerability scanners • External IP address • Conclusion: potentially dangerous attacker
Slide 19: Deeper into interesting - I Approaches to finding interesting stuff in logs without knowing what we look for specifically: • Rare things – Is compromise rare in your environment? • Different things – Is today “just another day” … or not? • “Out of character” things – It always does it… but not today? • Weird-looking things
Slide 20: Example 1: Can You Guess What Happened?! Destination Port 1D Baseline
Slide 21: Example 2: Can You Guess What Happened?!
Slide 22: Example 3: Can You Guess What Happened?!
Slide 23: Deeper into interesting - II • Things goings in the unusual direction – Your web server is now a web client - to “hack.kz” • Top things and Bottom Things – And them changing places! • Strange combinations of uninteresting things – A nice connection to a web server – A nice configuration change – A nice user creation – SO, is it NICE?
Slide 24: Example 4: Can You Guess What Happened?!
Slide 25: Example 5: Can You Guess What Happened?!
Slide 26: Deeper into interesting - III • Counts of an otherwise uninteresting thing – Pings? Connections to port 80? Error 404s? • Ratios of otherwise uninteresting things – Login failures / login successes? – Inbound / outbound connections? • Frequencies of things – Frequent becoming rare – and vice versa! • Time series behaving badly – Traffic overall grows, but traffic vs system X slows
Slide 27: Example 6: Can You Guess What Happened?!
Slide 28: More Examples • Structure of examples: – What was discovered? – What really happened? – How we discovered “the truth”? • All examples are from the tools prototyped and tested by the author … – Deviations and snapshot comparisons for firewall traffic – Scan detection from firewall data – Event rarity across system logs – “Rich” event sequences in mixed logs – Ratio analysis for logins and status codes – Pattern recognition and rule mining – Local to global trend comparisons in logs
Slide 29: Simple Example Revisited
Slide 30: Example 7: Can You Guess What Happened?!
Slide 31: Example 8: Fun Port Metrics
Slide 32: Example 9: Can You Guess What Happened?!
Slide 33: Real-life Usage • A busy analyst comes in the morning…gets coffee • Remembers that he needs to monitor security in addition to 1,576,903 other tasks • Looks at a combination report showing “What is Interesting Today?” • Investigates some of the items, takes action, etc • Tells the system not to bother him with the rest in the future • Goes for more coffee and drowns in the sea of other tasks
Slide 34: How YOU can do it - I? • First, collect logs and events – Syslog-NG to some SQL – AANVAL, OSSIM – ACID/BASE – Syslog2SQL – Custom log-to-SQL system (not that hard) – Whatever SQL log and event store (commercial, open-source, home-grown)
Slide 35: How YOU can do it - II? • Second, plan what to baseline – Network • Port access, system access, protocols, event types – System • Login/logout success/failure, process starts, configuration changes – Application or database • Data access type, user, data changes, client, etc
Slide 36: How YOU can do it - III? • Third, script the analysis techniques you liked – Perl with SQL access modules – Python – a new fave of those who know ! – PHP
Slide 37: How YOU can do it - IV? • Fourth, act on the results – Mitigate, block, disable, fire , slice-n-dice • Fifth, automate as needed – More data, more tools, more results…
Slide 38: Conclusion LM and KDL is… • …cool and new way of looking at log data • …actually works • …can help where common analysis methods fail • …not that hard • …can be done over different kind of data: database logs, application logs, etc
Slide 39: Take These Home with You!! • Look at your logs! You’d be happy you started now and not tomorrow (*) • Simple analysis is incredibly useful, but it only goes so far • “Complicated” analysis really isn’t that complicated and can be done “on the cheap”
Slide 40: Thank You for Coming!
Slide 41: Feedback? Q&A? Anton Chuvakin, Ph.D., GCIA, GCIH, GCFA Chief Logging Evangelist LogLogic, Inc anton@chuvakin.org http://www.chuvakin.org See www.info-secure.org for my papers, books, reviews and other security resources; www.securitywarrior.org for “Security Warrior” book (2004)




Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 2 (more)