Baselining Logs


Published on

This is my old presentation on using baselining methods for log analysis.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Baselining Logs

    1. 1. Baselining Logs How to create baselines and analyze logs effectively?
    2. 2. Outline <ul><li>What is a baseline? What is a log? </li></ul><ul><li>Why baseline? </li></ul><ul><li>Requirements for log “baselining” </li></ul><ul><li>Baseline lifecycle </li></ul><ul><li>What baselines well? </li></ul><ul><li>What baselines poorly? </li></ul><ul><li>Examples and how to do it </li></ul>
    3. 3. Definitions <ul><li>Log = record from a file about computer activities </li></ul><ul><li>Also: alert, event, alarm, etc </li></ul><ul><li>Baseline = “A starting point or condition against which future changes are measured” </li></ul>
    4. 4. Log Analysis Methods <ul><li>Manual </li></ul><ul><ul><li>‘ Tail’, ‘more’, etc </li></ul></ul><ul><li>Filtering </li></ul><ul><ul><li>Positive and negative (“Artificial ignorance”) </li></ul></ul><ul><li>Summarization and reports </li></ul><ul><li>Simple visualization </li></ul><ul><ul><li>“… worth a thousand words?” </li></ul></ul><ul><li>Simple automation </li></ul><ul><ul><li>Filters </li></ul></ul><ul><li>Correlation </li></ul><ul><ul><li>Rule-based and other methods </li></ul></ul>
    5. 5. Why Baseline? <ul><li>Situational awareness </li></ul><ul><ul><li>What is going on compared to some baseline </li></ul></ul><ul><li>New threat discovery </li></ul><ul><ul><li>Unique perspective unavailable from other methods </li></ul></ul><ul><li>Getting more value out of the network and security infrastructures </li></ul><ul><ul><li>Leverage the stuff you have in new ways </li></ul></ul><ul><li>Extracting what is really actionable automatically </li></ul><ul><ul><li>Out of baseline, unusual = bad? </li></ul></ul><ul><li>Measuring security (metrics, trends, etc) </li></ul><ul><ul><li>Compliance and regulations </li></ul></ul>
    6. 6. Simple Examples <ul><li>Hits on port 80 over the last week </li></ul><ul><li>User logins to server X per day </li></ul><ul><li>Use of su command per hour of day </li></ul><ul><li>Count of new ports hit on a firewall </li></ul><ul><li>Number of hosts touching each server per hour </li></ul>
    7. 7. What is needed? <ul><li>Data – and lots of it!  </li></ul><ul><li>Normalized format across data sources </li></ul><ul><li>Expert feedback into what is normal and bad </li></ul><ul><li>Not needed : “training data”! </li></ul>
    8. 8. Baseline Assumptions <ul><li>There is data available </li></ul><ul><li>Past was not disastrous! </li></ul><ul><li>Baseline is a correct model for the situation at hand </li></ul><ul><ul><li>won’t work for erratic/random phenomena or will cause “bad baselines” </li></ul></ul>
    9. 9. Baseline Lifecycle II <ul><li>Create </li></ul><ul><li>Update </li></ul><ul><li>Age </li></ul><ul><li>Compare and act on results </li></ul><ul><li>Refine </li></ul>
    10. 10. Baseline Lifecycle II
    11. 11. Baseline Creation <ul><li>Pick parameters to baseline </li></ul><ul><ul><li>E.g. NIDS alerts per sensor </li></ul></ul><ul><li>Pick a time period and time bin </li></ul><ul><ul><li>E.g . compare today to last week </li></ul></ul><ul><li>Pick comparison method </li></ul><ul><ul><li>E.g. compare today’s count to average </li></ul></ul>
    12. 12. Compare to Baseline <ul><li>NEW </li></ul><ul><li>OVER </li></ul><ul><li>UNDER </li></ul><ul><li>GONE </li></ul>Newly appeared, over baseline, under baseline (a lot vs a little), disappeared
    13. 13. “Interestingness” <ul><li>Something interesting ? </li></ul><ul><li>One research paper defines “interesting” thus: </li></ul><ul><ul><li>Unexpected to user </li></ul></ul><ul><ul><li>Actionable (we can and/or should do something about it) </li></ul></ul><ul><li>Examples : </li></ul><ul><ul><li>Compromised/infected system </li></ul></ul><ul><ul><li>Successful attack </li></ul></ul><ul><ul><li>Insider abuse and IP theft </li></ul></ul><ul><ul><li>Covert channel/hidden backdoor communication </li></ul></ul><ul><ul><li>Increase in probing </li></ul></ul><ul><ul><li>System crash </li></ul></ul>
    14. 14. What Baselines Well? <ul><li>Where different = interesting! </li></ul><ul><li>New attack type </li></ul><ul><li>Larger number of bytes </li></ul><ul><li>Sharp drop in log event flow </li></ul><ul><li>New usernames </li></ul><ul><li>More destinations hit </li></ul>
    15. 15. Example 1: Can you Guess What Happened?! <ul><li>This visual for this example is censored. The picture would show a one-dimensional of hits to a specific port. </li></ul>Destination Port 1D Baseline
    16. 16. Example 2: Can you Guess What Happened?!
    17. 17. Example 4: Can you Guess What Happened?!
    18. 18. Good Baselines [Operationally Tested] <ul><li>Log message type per sensor per day </li></ul><ul><li>Log message type per protocol/port </li></ul><ul><li>Log message types (watch for NEW) </li></ul><ul><li>Protocols per sensor per day </li></ul><ul><li>Count (unique (alert)) per source </li></ul><ul><li>Count (unique (port)) per source </li></ul>
    19. 19. What Baselines Poorly? <ul><li>Random things </li></ul><ul><ul><li>Hits on port TCP 3445 anybody?  </li></ul></ul><ul><li>Things that go up and down for on their own </li></ul><ul><ul><li>Accesses to a document on a server </li></ul></ul><ul><li>Sometimes, only large deviations matter </li></ul>
    20. 20. Examples
    21. 21. How YOU can do it? <ul><li>First, collect events </li></ul><ul><ul><li>AANVAL </li></ul></ul><ul><ul><li>OSSIM </li></ul></ul><ul><ul><li>OSSEC (?) </li></ul></ul><ul><ul><li>ACID/BASE </li></ul></ul><ul><ul><li>Syslog2SQL </li></ul></ul><ul><ul><li>Whatever SQL log and event store </li></ul></ul>
    22. 22. How YOU can do it? <ul><li>Second, plan what to baseline </li></ul><ul><li>Third, run the tools </li></ul><ul><li>Fourth, act on the results </li></ul><ul><ul><li>Mitigate, block, disable, slice-n-dice  </li></ul></ul><ul><li>Fifth, automate as needed </li></ul>
    23. 23. Summary <ul><li>Easy and effective way to deal with logs from multiple sources </li></ul><ul><li>Allow to automate log monitoring </li></ul><ul><ul><li>To some extent </li></ul></ul><ul><li>Result may be given to less skilled people for follow-up </li></ul>
    24. 24. Q&A? More information? <ul><li>Anton Chuvakin, Ph.D., GCIA, GCIH, GCFA </li></ul><ul><li> </li></ul><ul><li>Security Strategist </li></ul><ul><li>Author of “Security Warrior” (O’Reilly 2004) – </li></ul><ul><li>Book on logs is coming soon! </li></ul><ul><li>See for my papers, books, reviews and other security resources related to logs </li></ul>