Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton Chuvakin

6,698 views

Published on

Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton Chuvakin

Published in: Technology
3 Comments
13 Likes
Statistics
Notes
No Downloads
Views
Total views
6,698
On SlideShare
0
From Embeds
0
Number of Embeds
330
Actions
Shares
0
Downloads
0
Comments
3
Likes
13
Embeds 0
No embeds

No notes for slide
  • The easiest log analysis method (Linux/Unix):# grepailure /var/log/messagesLook for interesting failure message in messages log. It makes sense to also look for “ailed.” We are losing the first letter to not worry about the case sensitive. You can also switch grep to a case insensitive mode by typing “grep -i” (for ignore case) instead.# grepanton /var/log/messagesLook for particular user actions; this will definitely miss more than a few user actions, and so manual review of logs is needed. For example, some messages will not be marked with that use, such as when a user becomes “root” via “sudo” command.More Examples:grepsshd” *.log | (looks for all logs with “sshd” string in them)grep –i user messages (looks for “user”, “USER”, “User”, etc in “messages” files)grep –v sendmailsyslog(looks for all log lines without “sendmail” in them)===This slides reminds Unix people and teaches Windows people about the “grep” command that can be used to manually filter logs.grepsshd” *.log | process_ssh.shFilters all logs with “sshd” string in them and sends them to another programgrep –i user messages | grep –v ailureFilter for “user”, “USER”, “User”, messages which are not failuresgrep –v sendmailsyslog(looks for all log lines without “sendmail” in them)Using ”grep” is an example of positive filtering mentioned on the previous slide:, trying to focus on the bad things that one needs to see, investigate, and then act on: attacks, failures, etc. “-v” option showcases negative filtering.
  • So how easy is it to data mine with Splunk? In the above example I told Splunk I was interested in all log entries that contained the word “failed”. This refreshed the screen and showed me 25 entries that matched this keyword. Looking through the list I noticed that one of the entries was for a failed logon attempt. At that point I clicked the “similar” hyperlink for the log entry which produced the screen shown above. Note:it is showing us that we have ten failed logon attempts in the log file (four are not shown as they are off the bottom of the screen). So in less than 60 seconds I was able to identify all of the failed logon attempts for my network.
  • OSSEC rule shown
  • Marcus Ranum’s “nbs” tool can be obtained at http://www.ranum.com/security/computer_security/code/index.htmlThe description says: “Never Before Seen Anomaly detection driver. This utility creates a fast database of things that have been seen, and includes tools to print and update the database. Includes PDF documentation and walkthroughs.”Use the tool to pick up anomalous messages from your log pool.One can also build the same using grep, awk and other shell tools: ‘grep –v –f’ can be used to look for log entries excluding ones stored in file.
  • This slide shows one of the open source visualization tools , afterglow (that can be found at http://afterglow.sourceforge.net/ or at http://www.secviz.org/)The tool has been successfully used to visualize many types of log data.
  • Here we learn how to start using the tools we just discussed for taking control of your logs.Start by collecting logs; use syslog-ng or whatever syslog variant is available on your systems. To combine these with Windows logs use Snare or LASSO, which convert Windows logs to syslog.Store logs in files (compressed or not) or in a database such as open source MySQL.To start peeking at logs use search logs such as free “grep” or “splunk” that we mentioned above.When ready to move to correlation and alerting, get OSSEC or other tools. At this point, you gain a degree of awareness of what is going on in your environment.
  • Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton Chuvakin

    1. 1. Logs: Can’t Hate Them, Won’t Love Them!<br />Dr. Anton Chuvakin<br />Security Warrior Consulting<br />www.securitywarriorconsulting.com<br />April 2010<br />
    2. 2. What Is It?<br />This is a short log analysis and log management class given by Dr. Anton Chuvakin of Security Warrior Consulting at Project HoneynetAnnual Event 2010 in Mexico City, Mexico<br />www.chuvakin.org<br />www.SecurityWarriorConsulting.com<br />
    3. 3. Outline<br />Logs, WTH?<br />Logs and Log Analysis<br />Log Analysis Methods<br />Log Analysis -> Log Management<br />Log Management Mistakes<br />Future Ideas<br />Conclusions<br />
    4. 4. Hilarity!!!<br />“Logs Are Data??! <br />Bua-ha-ha-ha-ha-haaa!”<br />Aug 11 09:11:19 xx null pif ? exit! 0 <br />
    5. 5. Log Data Overview<br />From Where?<br />What Logs?<br /><ul><li>Firewalls/intrusion prevention
    6. 6. Routers/switches
    7. 7. Intrusion detection
    8. 8. Servers, desktops, mainframes
    9. 9. Business applications
    10. 10. Databases
    11. 11. Anti-virus
    12. 12. VPNs
    13. 13. Audit logs
    14. 14. Transaction logs
    15. 15. Intrusion logs
    16. 16. Connection logs
    17. 17. System performance records
    18. 18. User activity logs
    19. 19. Various alerts and other messages</li></li></ul><li>Log Chaos I - Login?<br /><18> Dec 17 15:45:57 10.14.93.7 ns5xp: NetScreen device_id=ns5xp system-warning-00515: Admin User netscreen has logged on via Telnet from 10.14.98.55:39073 (2002-12-17 15:50:53) <br /><57> Dec 25 00:04:32:%SEC_LOGIN-5-LOGIN_SUCCESS:LoginSuccess [user:yellowdog] [Source:10.4.2.11] [localport:23] at 20:55:40 UTC Fri Feb 28 2006<br /><122> Mar 4 09:23:15 localhost sshd[27577]: Accepted password for kyle from ::ffff:192.168.138.35 port 2895 ssh2<br /><13> Fri Mar 17 14:29:38 2006 680 Security SYSTEM User Failure Audit ENTERPRISE Account Logon Logon attempt by: MICROSOFT_AUTHENTICATION_PACKAGE_V1_0    Logon account:  POWERUSER   <br />
    20. 20. Log Chaos II - Accept?<br />messages:Dec 16 17:28:49 10.14.93.7 ns5xp: NetScreen device_id=ns5xp system-notification-00257(traffic): start_time="2002-12-16 17:33:36" duration=5 policy_id=0 service=telnet proto=6 src zone=Trust dst zone=Untrust action=Permit sent=1170 rcvd=1500 src=10.14.94.221 dst=10.14.98.107 src_port=1384 dst_port=23 translated ip=10.14.93.7 port=1206<br />Apr 6 06:06:02 Checkpoint NGX SRC=Any,DEST=ANY,Accept=nosubstitute,Do Not Log,Installspyware,lieonyourtaxes,orbetteryet,dontpaythem<br />Mar 6 06:06:02 winonasu-pix %PIX-6-302013: Built outbound TCP connection 315210 596 for outside:172.196.9.206/1214 (172.196.9.206/1214) to inside:199.17.151.103/1438 (199.17.151.103/1438)<br />
    21. 21. SHOCK!!!<br />… and that is BEFORE we even mention application logs!<br />
    22. 22. Log Chaos Everywhere!<br />No standard format<br />No standard schema, no level of details<br />No standard meaning<br />No taxonomy<br />No standard transport<br />No shared knowledge on what to log and how<br />No logging guidance for developers<br />No standard API / libraries for log production<br />
    23. 23. Result?<br />%PIX|ASA-3-713185 Error: Username too long - connection aborted<br />%PIX|ASA-5-501101 User transitioning priv level<br />ERROR: transport error 202: send failed: Success<br />sles10sp1oes oesaudit: type=CWD msg=audit(09/27/07 22:09:45.683:318) :  cwd=/home/user1 <br />
    24. 24. More results?<br />userenv[error] 1030 RCI-CORPwsupx No description available<br />Aug 11 09:11:19 xx null pif ? exit! 0 <br />Apr 23 23:03:08 support last message repeated 3 times<br />Apr 23 23:04:23 support last message repeated 5 times<br />Apr 23 23:05:38 support last message repeated 5 times<br />
    25. 25. It DOES Suck!<br />Well, it does…<br />… but we need to analyze logs every time an incident occurs and in many other cases!<br />
    26. 26. LOG ANALYSIS<br />We will discuss<br /><ul><li> Log analysis methods
    27. 27. Some log analysis tools</li></li></ul><li>Log Analysis: Why<br />Compliance and regulations<br />Situational awareness andnew threat discovery<br />Getting more value out of the network and security infrastructure<br />Measurement ofsecurity (metrics, trends)<br />Incident response (last, but not least!)<br />
    28. 28. Log Analysis Basics: Summary<br />Manual<br />Filtering<br />Summarization and reports<br />Simple visualization<br />Log searching<br />Correlation<br />Log Data mining<br />
    29. 29. Log Analysis Basics: Manual<br />Manual log review<br />Just fire your trusty tail, more, notepad, vi, Event Viewer, etc and hop to it! <br />Pros:<br />Easy, no tools required (neither build nor buy)<br />Cons:<br />Try it with 10GB log file one day <br />Boring as Hell! <br />
    30. 30. See!?<br />Log for VMware Server, pid=2364, version=e.x.p, build=build-63231, option=BETA, section=2[2007-12-03 14:57:00.931 'App' 4516 info] Current working directory: C:Documents and SettingsAll UsersApplication DataVMwareVMware Server<br />[2007-12-03 14:57:00.946 'BaseLibs' 4516 info] HOSTINFO: Seeing Intel CPU, numCoresPerCPU 2 numThreadsPerCore 1.<br />[2007-12-03 14:57:00.946 'BaseLibs' 4516 info] HOSTINFO: This machine has 1 physical CPUS, 2 total cores, and 2 logical CPUs.<br />[2007-12-03 14:57:00.946 'App' 4516 info] Trying blklistsvc<br />[2007-12-03 14:57:00.946 'App' 4516 info] Trying cimsvc<br />[2007-12-03 14:57:00.946 'App' 4516 info] Trying directorysvc<br />[2007-12-03 14:57:00.946 'App' 4516 info] Trying hostsvc<br />[2007-12-03 14:57:01.571 'NetworkProvider' 4516 info] Using netmap configuration file C:Documents and SettingsAll UsersApplication DataVMwareVMware Server etmap.conf<br />[2007-12-03 14:57:01.587 'NetworkProvider' 4516 error] VNL_GetBriggeState call failed with status 1.Refreshing network information failed<br />[2007-12-03 14:57:03.165 'NetworkProvider' 4516 info] Active ftp is 1<br />[2007-12-03 14:57:03.165 'NetworkProvider' 4516 info] Allowanyoui is 0<br />[2007-12-03 14:57:03.165 'NetworkProvider' 4516 info] udptimeout is 30<br />[2007-12-03 14:57:03.337 'HostsvcPlugin' 4516 warning] No advanced options found<br />[2007-12-03 14:57:03.368 'Hostsvc::AutoStartManager' 4516 info] VM autostart configuration: C:Documents and SettingsAll UsersApplication DataVMwareVMware ServerhostdvmAutoStart.xml<br />[2007-12-03 14:57:04.212 'Locale' 4516 info] Locale subsystem initialized from C:Program FilesVMwareVMware Serverlocale/ with default locale en.<br />[2007-12-03 14:57:04.212 'ResourcePool ha-root-pool' 4516 info] Resource pool instantiated<br />[2007-12-03 14:57:04.212 'ResourcePool ha-root-pool' 4516 info] Refresh interval: 60 seconds<br />[2007-12-03 14:57:04.212 'HostsvcPlugin' 4516 info] Plugin initialized<br />[2007-12-03 14:57:04.212 'App' 4516 info] Trying internalsvc<br />[2007-12-03 14:57:04.259 'App' 4516 info] Trying nfcsvc<br />[2007-12-03 14:57:04.305 'Nfc' 4516 info] Breakpoints disabled<br />[2007-12-03 14:57:04.321 'BaseLibs' 4516 info] Using system libcrypto, version 9070AF<br />[2007-12-03 14:57:06.399 'BaseLibs' 4516 info] [NFC DEBUG] Successfully loaded the diskLib library<br />[2007-12-03 14:57:06.415 'Nfc' 4516 info] Plugin initialized<br />[2007-12-03 14:57:06.415 'App' 4516 info] Trying partitionsvc<br />[2007-12-03 14:57:06.415 'App' 4516 info] Trying proxysvc<br />
    31. 31. Log Analysis Basics: Filtering<br />Log Filtering<br />Just show me the bad stuff; here is the list (positive)<br />Just ignore the good stuff; here is the list (negative or Artificial Ignorance)<br />Pros:<br />Easy result interpretation: see->act<br />Many tools or write your own<br />Cons:<br />Patterns beyond single messages?<br />Neither good nor bad, but interesting?<br />
    32. 32. Example: How to grep Logs?<br />The easiest log analysis method (Linux/Unix):<br /># grepailure /var/log/messages<br />Filter interesting failure message in messages log<br /># grep –v uccess /var/log/messages<br />Filter messages other than success in messages log<br /># grep –vf LIST /var/log/messages<br />Filter messages other than those listed in FILE<br />
    33. 33. Log Analysis Basics: Summary<br />Summarization and reports<br />Top X Users, Connections by IP, etc <br />Pros:<br />Dramatically reduces the size of data <br />Suitable for high-level reporting <br />Cons:<br />Loss of information by summarizing<br />Which report to pick for a task?<br />
    34. 34. Make A Summary<br />SELECT source, destination, proto, user, COUNT(*) FROMlog_tableWHERE user LIKE ‘an%’ GROUP BY source, destination, proto, user ORDER BY source DESC<br />P.S. Pray tell me, how those nasty logs ended up in a nice database like that? <br />
    35. 35. Log Analysis Basics: Search<br />Googling Logs<br />User specifies a time period, a log source or all, and an expression; gets back logs that match (regexvs Boolean)<br />Pro<br />Easy to understand<br />Quick to do<br />Con<br />What do you search for?<br />A LOT of data back, sometimes<br />
    36. 36. How to Do It: Splunk Search<br />
    37. 37. Log Analysis Basics: Correlation<br />Correlation<br />Rule-based and other 'correlation' and 'Correlation' algorithms<br />Pro<br />Highly automated<br />Con<br />Needs rules written by experts<br />Needs tuning for each site<br />
    38. 38. Example Rule<br /><rule id="40112" level="12" timeframe="240"><br /><if_group>authentication_success</if_group><br /><if_matched_group>authentication_failures<br /> </if_matched_group> <br /><same_source_ip /> <br /><description>Multiple authentication failures followed a success.</description> <br /></rule><br />OSSEC rule shown; see OSSEC.net for details<br />
    39. 39. Log Analysis Basics: Data Mining<br />Log mining<br />Algorithms that extract meaning from raw data<br />Pro<br />Promises fully-automated analysis <br />Con<br />Still research-grade technology<br />
    40. 40. Example Ranum NBS<br />Ranum’s “nbs” (never before seen) – the simplest log mining tool.<br />No knowledge about “bad” goes in -> insight comes out!<br />Look Ma, NO RULES!<br />Use the tool to pick up anomalous messages from your log pool<br />See for more: http://www.slideshare.net/anton_chuvakin/log-mining-beyond-log-analysis<br />
    41. 41. Log Analysis Basics: Visualization<br />Visualization, from simple to 4D<br />A pie chart worth a thousand words?<br />Pro<br />You just look at it and know what it means and what to do<br />Con<br />You just look at it, and hmmm…. <br />
    42. 42. How to Do It: afterglow Tool<br />
    43. 43. Log Analysis Basics: When<br />Real time vs. historical analysis<br />Do you always need real-time?<br />What data cannot be analyzed in real-time?<br />A day later vs. never question<br />Historical analysis for deep insight<br />
    44. 44. How To Start Using The Tools?<br />1. Collect logs<br /> Tools: Syslog-ng, standard syslog, etc<br />2. Store logs<br />Tools: MySQL, etc<br />3. Search logs<br /> Tools: grep, splunk, etc<br />4. Correlate and alert<br /> Tools: OSSEC, OSSIM, sec, nbs, logwatch, etc<br />
    45. 45. Key Points to Remember<br />Techniques review<br />Tools review<br />Any other tool suggestions?<br />Start thinking buy vs. build<br />
    46. 46. From Log Analysis to Log Management<br />We will discuss<br /><ul><li> Log management lifecycle
    47. 47. Log management motivations</li></li></ul><li>Why Log Management?<br />Three main reasons:<br />Security <br />Operations <br />Compliance<br />
    48. 48. Log Analysis to Log Management<br />Files, syslog, other<br />Act<br />Collect<br />Secure<br />Humans still needed!<br />Make <br />Conclusions<br />SNMP, E-mail, etc<br />Alert<br />Search<br />Report<br />Store<br />Search, Report and Analytics<br />Immutable Logs<br />
    49. 49. Log Management Challenges<br />Not enough data<br />Too much data<br />Diverse records<br />Time out of sync<br />False records<br />Duplicate data<br />Hard to get data<br />
    50. 50. LOG RETENTION – A TRIVIAL MATTER?<br />We will discuss<br /><ul><li> Log retention
    51. 51. Issues with various log retention technologies</li></li></ul><li>What is Log Retention?<br />Q: When is log storage considered log retention?<br />A: Log retention = <br /> Log storage + <br /> Accessibility + <br /> Log destruction<br />
    52. 52. What is NOT Retention?<br />A database that stores a few fields from each log <br />A tape closet with log data<br />tapes that were never verified<br /> – and lurking rats<br />A syslog server that just spools logs into files<br />
    53. 53. Retention Time Question<br />I have the answer!  No, not really.<br />Regulations?<br />Unambiguous: PCI – keep’em for 1 year<br />Tiered retention strategy<br />Online<br />Near line<br />Offline/tape<br />
    54. 54. Example: Retention Strategy<br />Type + network + storage tier<br />IDS + DMZ + online = 90 days<br />Firewall + DMZ + online = 30 days<br />Servers + internal + online = 90 days<br />ALL + DMZ + archive = 3 years<br />Critical + internal + archive = 5 years<br />OTHER + internal + archive = 1 year<br />
    55. 55. How to Create A Log Retention Strategy<br />Assess applicable compliance requirements <br />Look at risk posture and other needs<br />Look at various log sources and their log volumes<br />Review available storage options<br />Decide on tiers<br />
    56. 56. Log Storage Tiers: Options<br />RDBMS <br /><ul><li>Oracle, MySQL, etc</li></ul>Flat files<br /><ul><li>Files+: Compressed, indexed, etc</li></ul>Hybrid<br /><ul><li>Combine #1 and #2</li></ul>Proprietary datastore<br /><ul><li>Build from scratch to store logs</li></ul>Tape<br />
    57. 57. Example: How to Deal with A Trillion Log Messages<br />How to manage a trillion (~1000 billions) log messages?<br />Hundreds of terabytes (1/2 of a petabyte …) of data<br />Which tool to pick?<br />"Sorry, buddy, you are writing your own code here!”<br />
    58. 58. Key Points to Remember<br />What is really log retention?<br />Review log storage option to use (or to buy in a vendor tool)<br />Learn about storage challenges<br />
    59. 59. LOGGING MISTAKES<br />We will discuss<br /><ul><li>Common log management mistakes</li></li></ul><li>Six Mistakes of Log Management<br />1. Not logging at all<br />2. Not looking at the logs<br />3. Storing logs for too short a time<br />4. Prioritizing the log records before collection<br />5. Ignoring the logs from applications<br />6. Only looking at what you know is bad<br />
    60. 60. Mistake 1: Not Logging AT ALL …<br />… and its aggravated version: “… and not knowing that you don’t”<br />No logging? -> well, no logs for incident investigation and response, audits, C&A, control validation, compliance<br />Got logs?<br />If your answer is ‘NO' don’t listen further: run and enable logging right now!<br />
    61. 61. Example: Oracle<br />Defaults: <br />minimum system logging<br />minimum database server access<br />no data access logging<br />So, where is …<br />data access audit<br />schema and data change audit<br />configuration change audit<br />
    62. 62. Mistake 2: Not looking at logs<br />Collection of logs has value!<br />But review boosts the value 10-fold(numbersare estimates )<br />More in-depth analysis boosts it a lot more!<br />Two choices here …<br />Review after an incident <br />Ongoing review<br />
    63. 63. Example Log Review Priorities<br />DMZ NIDS<br />DMZ firewall<br />DMZ servers with applications<br />Critical internal servers<br />Other servers<br />Select critical application<br />Other applications<br />
    64. 64. Mistake 3: Storing logs for too short a time<br />You are saying you HAD logs? And how is it useful?<br />Retention question is a hard one. Truly, nobody has the answer!<br />Seven years? A year? 90 days? A week? Until the disk runs out?<br />Common: 90 days online and up to 1-3 years near line or offline<br />
    65. 65. Also A Mistake: Storing Logs for TOO LONG?!<br />Retention = storage + access + destruction<br />Why DESTROY LOGS?<br />Privacy regulations<br />Litigation risk management <br />System resource utilization<br />
    66. 66. Example Retention Strategy<br />Type + network + storage tier<br />IDS + DMZ + online = 90 days<br />Firewall + DMZ + online = 30 days<br />Servers + internal + online = 90 days<br />ALL + DMZ + archive = 3 years<br />Critical + internal + archive = 5 years<br />OTHER + internal + archive = 1 year<br />
    67. 67. Mistake 4: Deciding What’s Relevant Before Collection<br />How would you know what is …<br />… Security-relevant<br />… Compliance-relevant<br />… or will solve the problem you’d have TOMORROW!?<br />The answer? Just grab everything!<br />
    68. 68. Example Common Logging Order <br />Log everything<br /> Retain most everything<br /> Analyze enough<br /> Summarize and report on a subset<br />Monitor some<br />Act on a few records<br />
    69. 69. Mistake 5: Ignoring Logs from Applications <br />Firewall – Yes, Linux – Yes, Windows – Yes. NIDS – Yes<br />but …<br />Oracle - ?<br />SAP - ?<br />Your Application X– No?<br />
    70. 70. Mistake 6: Looking for only the bad stuff<br />Correlation, rules, regex matching What is in common?<br />You have to know what you are looking for!<br />Can we somehow just see what we need to see?<br />Data mining technology can help<br />
    71. 71. Example: Log Mining Techniques in Action<br />Too many attack types from a single IP address<br />Right next to known vulnerability scanners<br />External IP address<br />Conclusion: potentially dangerous attacker<br />
    72. 72. Conclusions – Serious!<br />Logs are a tough beast to tackle<br />Thus, many people ignore them<br />And then bad things happen to them!<br />So, treat log seriously and analyze them!<br />
    73. 73. However…<br />“The company’s server logs recorded only unsuccessful log-in attempts, not successful ones, frustrating a detailed analysis.”<br />
    74. 74. Questions<br />Dr. Anton Chuvakin<br />Email:anton@chuvakin.org<br />Google Voice: 510-771-7106 <br />Site:http://www.chuvakin.org<br />Blog:http://www.securitywarrior.org<br />LinkedIn:http://www.linkedin.com/in/chuvakin<br />Consulting: www.securitywarriorconsulting.com<br />Twitter:@anton_chuvakin<br />
    75. 75. More on Anton<br />Book author: “Security Warrior”, “PCI Compliance”, “Information Security Management Handbook”, “Know Your Enemy II”, “Hacker’s Challenge 3”, etc<br />Conference speaker: SANS, FIRST, GFIRST, ISSA, CSI, Interop, many, many others worldwide<br />Standard developer: CEE, CVSS, OVAL, etc<br />Community role: SANS, Honeynet Project, WASC, CSI, ISSA, OSSTMM, InfraGard, ISSA, others<br />Past roles: Researcher, Security Analyst, Strategist, Evangelist, Product Manager, Consultant<br />
    76. 76. Security Warrior Consulting Services<br />Logging and log management policy<br />Develop logging policies and processes, log review procedures, workflows and periodic tasks as well as help architect those to solve organization problems <br />Plan and implement log management architecture to support your business cases; develop specific components such as log data collection, filtering, aggregation, retention, log source configuration as well as reporting, review and validation<br />Customize industry “best practices” related to logging and log review to fit your environment, help link these practices to business services and regulations<br />Help integrate logging tools and processes into IT and business operations<br />Content development<br />Develop of correlation rules, reports and other content to make your SIEM and log management product more useful to you and more applicable to your risk profile and compliance needs<br />Create and refine policies, procedures and operational practices for logging and log management to satisfy requirements of PCI DSS, HIPAA, NERC, FISMA and other regulations<br />More at www.SecurityWarriorConsulting.com<br />

    ×