Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton Chuvakin
1. Logs: Can’t Hate Them, Won’t Love Them! Dr. Anton Chuvakin Security Warrior Consulting www.securitywarriorconsulting.com April 2010
2. What Is It? This is a short log analysis and log management class given by Dr. Anton Chuvakin of Security Warrior Consulting at Project HoneynetAnnual Event 2010 in Mexico City, Mexico www.chuvakin.org www.SecurityWarriorConsulting.com
20. Log Chaos II - Accept? messages:Dec 16 17:28:49 10.14.93.7 ns5xp: NetScreen device_id=ns5xp system-notification-00257(traffic): start_time="2002-12-16 17:33:36" duration=5 policy_id=0 service=telnet proto=6 src zone=Trust dst zone=Untrust action=Permit sent=1170 rcvd=1500 src=10.14.94.221 dst=10.14.98.107 src_port=1384 dst_port=23 translated ip=10.14.93.7 port=1206 Apr 6 06:06:02 Checkpoint NGX SRC=Any,DEST=ANY,Accept=nosubstitute,Do Not Log,Installspyware,lieonyourtaxes,orbetteryet,dontpaythem Mar 6 06:06:02 winonasu-pix %PIX-6-302013: Built outbound TCP connection 315210 596 for outside:172.196.9.206/1214 (172.196.9.206/1214) to inside:199.17.151.103/1438 (199.17.151.103/1438)
21. SHOCK!!! … and that is BEFORE we even mention application logs!
22. Log Chaos Everywhere! No standard format No standard schema, no level of details No standard meaning No taxonomy No standard transport No shared knowledge on what to log and how No logging guidance for developers No standard API / libraries for log production
23. Result? %PIX|ASA-3-713185 Error: Username too long - connection aborted %PIX|ASA-5-501101 User transitioning priv level ERROR: transport error 202: send failed: Success sles10sp1oes oesaudit: type=CWD msg=audit(09/27/07 22:09:45.683:318) : cwd=/home/user1
24. More results? userenv[error] 1030 RCI-CORPsupx No description available Aug 11 09:11:19 xx null pif ? exit! 0 Apr 23 23:03:08 support last message repeated 3 times Apr 23 23:04:23 support last message repeated 5 times Apr 23 23:05:38 support last message repeated 5 times
25. It DOES Suck! Well, it does… … but we need to analyze logs every time an incident occurs and in many other cases!
26.
27.
28. Log Analysis Basics: Summary Manual Filtering Summarization and reports Simple visualization Log searching Correlation Log Data mining
29. Log Analysis Basics: Manual Manual log review Just fire your trusty tail, more, notepad, vi, Event Viewer, etc and hop to it! Pros: Easy, no tools required (neither build nor buy) Cons: Try it with 10GB log file one day Boring as Hell!
30. See!? Log for VMware Server, pid=2364, version=e.x.p, build=build-63231, option=BETA, section=2[2007-12-03 14:57:00.931 'App' 4516 info] Current working directory: C:ocuments and Settingsll Userspplication DataMwareMware Server [2007-12-03 14:57:00.946 'BaseLibs' 4516 info] HOSTINFO: Seeing Intel CPU, numCoresPerCPU 2 numThreadsPerCore 1. [2007-12-03 14:57:00.946 'BaseLibs' 4516 info] HOSTINFO: This machine has 1 physical CPUS, 2 total cores, and 2 logical CPUs. [2007-12-03 14:57:00.946 'App' 4516 info] Trying blklistsvc [2007-12-03 14:57:00.946 'App' 4516 info] Trying cimsvc [2007-12-03 14:57:00.946 'App' 4516 info] Trying directorysvc [2007-12-03 14:57:00.946 'App' 4516 info] Trying hostsvc [2007-12-03 14:57:01.571 'NetworkProvider' 4516 info] Using netmap configuration file C:ocuments and Settingsll Userspplication DataMwareMware Serveretmap.conf [2007-12-03 14:57:01.587 'NetworkProvider' 4516 error] VNL_GetBriggeState call failed with status 1.Refreshing network information failed [2007-12-03 14:57:03.165 'NetworkProvider' 4516 info] Active ftp is 1 [2007-12-03 14:57:03.165 'NetworkProvider' 4516 info] Allowanyoui is 0 [2007-12-03 14:57:03.165 'NetworkProvider' 4516 info] udptimeout is 30 [2007-12-03 14:57:03.337 'HostsvcPlugin' 4516 warning] No advanced options found [2007-12-03 14:57:03.368 'Hostsvc::AutoStartManager' 4516 info] VM autostart configuration: C:ocuments and Settingsll Userspplication DataMwareMware ServerostdmAutoStart.xml [2007-12-03 14:57:04.212 'Locale' 4516 info] Locale subsystem initialized from C:rogram FilesMwareMware Serverocale/ with default locale en. [2007-12-03 14:57:04.212 'ResourcePool ha-root-pool' 4516 info] Resource pool instantiated [2007-12-03 14:57:04.212 'ResourcePool ha-root-pool' 4516 info] Refresh interval: 60 seconds [2007-12-03 14:57:04.212 'HostsvcPlugin' 4516 info] Plugin initialized [2007-12-03 14:57:04.212 'App' 4516 info] Trying internalsvc [2007-12-03 14:57:04.259 'App' 4516 info] Trying nfcsvc [2007-12-03 14:57:04.305 'Nfc' 4516 info] Breakpoints disabled [2007-12-03 14:57:04.321 'BaseLibs' 4516 info] Using system libcrypto, version 9070AF [2007-12-03 14:57:06.399 'BaseLibs' 4516 info] [NFC DEBUG] Successfully loaded the diskLib library [2007-12-03 14:57:06.415 'Nfc' 4516 info] Plugin initialized [2007-12-03 14:57:06.415 'App' 4516 info] Trying partitionsvc [2007-12-03 14:57:06.415 'App' 4516 info] Trying proxysvc
31. Log Analysis Basics: Filtering Log Filtering Just show me the bad stuff; here is the list (positive) Just ignore the good stuff; here is the list (negative or Artificial Ignorance) Pros: Easy result interpretation: see->act Many tools or write your own Cons: Patterns beyond single messages? Neither good nor bad, but interesting?
32. Example: How to grep Logs? The easiest log analysis method (Linux/Unix): # grepailure /var/log/messages Filter interesting failure message in messages log # grep –v uccess /var/log/messages Filter messages other than success in messages log # grep –vf LIST /var/log/messages Filter messages other than those listed in FILE
33. Log Analysis Basics: Summary Summarization and reports Top X Users, Connections by IP, etc Pros: Dramatically reduces the size of data Suitable for high-level reporting Cons: Loss of information by summarizing Which report to pick for a task?
34. Make A Summary SELECT source, destination, proto, user, COUNT(*) FROMlog_tableWHERE user LIKE ‘an%’ GROUP BY source, destination, proto, user ORDER BY source DESC P.S. Pray tell me, how those nasty logs ended up in a nice database like that?
35. Log Analysis Basics: Search Googling Logs User specifies a time period, a log source or all, and an expression; gets back logs that match (regexvs Boolean) Pro Easy to understand Quick to do Con What do you search for? A LOT of data back, sometimes
37. Log Analysis Basics: Correlation Correlation Rule-based and other 'correlation' and 'Correlation' algorithms Pro Highly automated Con Needs rules written by experts Needs tuning for each site
38. Example Rule <rule id="40112" level="12" timeframe="240"> <if_group>authentication_success</if_group> <if_matched_group>authentication_failures </if_matched_group> <same_source_ip /> <description>Multiple authentication failures followed a success.</description> </rule> OSSEC rule shown; see OSSEC.net for details
39. Log Analysis Basics: Data Mining Log mining Algorithms that extract meaning from raw data Pro Promises fully-automated analysis Con Still research-grade technology
40. Example Ranum NBS Ranum’s “nbs” (never before seen) – the simplest log mining tool. No knowledge about “bad” goes in -> insight comes out! Look Ma, NO RULES! Use the tool to pick up anomalous messages from your log pool See for more: http://www.slideshare.net/anton_chuvakin/log-mining-beyond-log-analysis
41. Log Analysis Basics: Visualization Visualization, from simple to 4D A pie chart worth a thousand words? Pro You just look at it and know what it means and what to do Con You just look at it, and hmmm….
43. Log Analysis Basics: When Real time vs. historical analysis Do you always need real-time? What data cannot be analyzed in real-time? A day later vs. never question Historical analysis for deep insight
44. How To Start Using The Tools? 1. Collect logs Tools: Syslog-ng, standard syslog, etc 2. Store logs Tools: MySQL, etc 3. Search logs Tools: grep, splunk, etc 4. Correlate and alert Tools: OSSEC, OSSIM, sec, nbs, logwatch, etc
45. Key Points to Remember Techniques review Tools review Any other tool suggestions? Start thinking buy vs. build
46.
47.
48. Log Analysis to Log Management Files, syslog, other Act Collect Secure Humans still needed! Make Conclusions SNMP, E-mail, etc Alert Search Report Store Search, Report and Analytics Immutable Logs
49. Log Management Challenges Not enough data Too much data Diverse records Time out of sync False records Duplicate data Hard to get data
50.
51.
52. What is NOT Retention? A database that stores a few fields from each log A tape closet with log data tapes that were never verified – and lurking rats A syslog server that just spools logs into files
53. Retention Time Question I have the answer! No, not really. Regulations? Unambiguous: PCI – keep’em for 1 year Tiered retention strategy Online Near line Offline/tape
54. Example: Retention Strategy Type + network + storage tier IDS + DMZ + online = 90 days Firewall + DMZ + online = 30 days Servers + internal + online = 90 days ALL + DMZ + archive = 3 years Critical + internal + archive = 5 years OTHER + internal + archive = 1 year
55. How to Create A Log Retention Strategy Assess applicable compliance requirements Look at risk posture and other needs Look at various log sources and their log volumes Review available storage options Decide on tiers
56.
57. Example: How to Deal with A Trillion Log Messages How to manage a trillion (~1000 billions) log messages? Hundreds of terabytes (1/2 of a petabyte …) of data Which tool to pick? "Sorry, buddy, you are writing your own code here!”
58. Key Points to Remember What is really log retention? Review log storage option to use (or to buy in a vendor tool) Learn about storage challenges
59.
60. Mistake 1: Not Logging AT ALL … … and its aggravated version: “… and not knowing that you don’t” No logging? -> well, no logs for incident investigation and response, audits, C&A, control validation, compliance Got logs? If your answer is ‘NO' don’t listen further: run and enable logging right now!
61. Example: Oracle Defaults: minimum system logging minimum database server access no data access logging So, where is … data access audit schema and data change audit configuration change audit
62. Mistake 2: Not looking at logs Collection of logs has value! But review boosts the value 10-fold(numbersare estimates ) More in-depth analysis boosts it a lot more! Two choices here … Review after an incident Ongoing review
63. Example Log Review Priorities DMZ NIDS DMZ firewall DMZ servers with applications Critical internal servers Other servers Select critical application Other applications
64. Mistake 3: Storing logs for too short a time You are saying you HAD logs? And how is it useful? Retention question is a hard one. Truly, nobody has the answer! Seven years? A year? 90 days? A week? Until the disk runs out? Common: 90 days online and up to 1-3 years near line or offline
65. Also A Mistake: Storing Logs for TOO LONG?! Retention = storage + access + destruction Why DESTROY LOGS? Privacy regulations Litigation risk management System resource utilization
66. Example Retention Strategy Type + network + storage tier IDS + DMZ + online = 90 days Firewall + DMZ + online = 30 days Servers + internal + online = 90 days ALL + DMZ + archive = 3 years Critical + internal + archive = 5 years OTHER + internal + archive = 1 year
67. Mistake 4: Deciding What’s Relevant Before Collection How would you know what is … … Security-relevant … Compliance-relevant … or will solve the problem you’d have TOMORROW!? The answer? Just grab everything!
68. Example Common Logging Order Log everything Retain most everything Analyze enough Summarize and report on a subset Monitor some Act on a few records
69. Mistake 5: Ignoring Logs from Applications Firewall – Yes, Linux – Yes, Windows – Yes. NIDS – Yes but … Oracle - ? SAP - ? Your Application X– No?
70. Mistake 6: Looking for only the bad stuff Correlation, rules, regex matching What is in common? You have to know what you are looking for! Can we somehow just see what we need to see? Data mining technology can help
71. Example: Log Mining Techniques in Action Too many attack types from a single IP address Right next to known vulnerability scanners External IP address Conclusion: potentially dangerous attacker
72. Conclusions – Serious! Logs are a tough beast to tackle Thus, many people ignore them And then bad things happen to them! So, treat log seriously and analyze them!
73. However… “The company’s server logs recorded only unsuccessful log-in attempts, not successful ones, frustrating a detailed analysis.”
74. Questions Dr. Anton Chuvakin Email:anton@chuvakin.org Google Voice: 510-771-7106 Site:http://www.chuvakin.org Blog:http://www.securitywarrior.org LinkedIn:http://www.linkedin.com/in/chuvakin Consulting: www.securitywarriorconsulting.com Twitter:@anton_chuvakin
75. More on Anton Book author: “Security Warrior”, “PCI Compliance”, “Information Security Management Handbook”, “Know Your Enemy II”, “Hacker’s Challenge 3”, etc Conference speaker: SANS, FIRST, GFIRST, ISSA, CSI, Interop, many, many others worldwide Standard developer: CEE, CVSS, OVAL, etc Community role: SANS, Honeynet Project, WASC, CSI, ISSA, OSSTMM, InfraGard, ISSA, others Past roles: Researcher, Security Analyst, Strategist, Evangelist, Product Manager, Consultant
76. Security Warrior Consulting Services Logging and log management policy Develop logging policies and processes, log review procedures, workflows and periodic tasks as well as help architect those to solve organization problems Plan and implement log management architecture to support your business cases; develop specific components such as log data collection, filtering, aggregation, retention, log source configuration as well as reporting, review and validation Customize industry “best practices” related to logging and log review to fit your environment, help link these practices to business services and regulations Help integrate logging tools and processes into IT and business operations Content development Develop of correlation rules, reports and other content to make your SIEM and log management product more useful to you and more applicable to your risk profile and compliance needs Create and refine policies, procedures and operational practices for logging and log management to satisfy requirements of PCI DSS, HIPAA, NERC, FISMA and other regulations More at www.SecurityWarriorConsulting.com
Editor's Notes
The easiest log analysis method (Linux/Unix):# grepailure /var/log/messagesLook for interesting failure message in messages log. It makes sense to also look for “ailed.” We are losing the first letter to not worry about the case sensitive. You can also switch grep to a case insensitive mode by typing “grep -i” (for ignore case) instead.# grepanton /var/log/messagesLook for particular user actions; this will definitely miss more than a few user actions, and so manual review of logs is needed. For example, some messages will not be marked with that use, such as when a user becomes “root” via “sudo” command.More Examples:grepsshd” *.log | (looks for all logs with “sshd” string in them)grep –i user messages (looks for “user”, “USER”, “User”, etc in “messages” files)grep –v sendmailsyslog(looks for all log lines without “sendmail” in them)===This slides reminds Unix people and teaches Windows people about the “grep” command that can be used to manually filter logs.grepsshd” *.log | process_ssh.shFilters all logs with “sshd” string in them and sends them to another programgrep –i user messages | grep –v ailureFilter for “user”, “USER”, “User”, messages which are not failuresgrep –v sendmailsyslog(looks for all log lines without “sendmail” in them)Using ”grep” is an example of positive filtering mentioned on the previous slide:, trying to focus on the bad things that one needs to see, investigate, and then act on: attacks, failures, etc. “-v” option showcases negative filtering.
So how easy is it to data mine with Splunk? In the above example I told Splunk I was interested in all log entries that contained the word “failed”. This refreshed the screen and showed me 25 entries that matched this keyword. Looking through the list I noticed that one of the entries was for a failed logon attempt. At that point I clicked the “similar” hyperlink for the log entry which produced the screen shown above. Note:it is showing us that we have ten failed logon attempts in the log file (four are not shown as they are off the bottom of the screen). So in less than 60 seconds I was able to identify all of the failed logon attempts for my network.
OSSEC rule shown
Marcus Ranum’s “nbs” tool can be obtained at http://www.ranum.com/security/computer_security/code/index.htmlThe description says: “Never Before Seen Anomaly detection driver. This utility creates a fast database of things that have been seen, and includes tools to print and update the database. Includes PDF documentation and walkthroughs.”Use the tool to pick up anomalous messages from your log pool.One can also build the same using grep, awk and other shell tools: ‘grep –v –f’ can be used to look for log entries excluding ones stored in file.
This slide shows one of the open source visualization tools , afterglow (that can be found at http://afterglow.sourceforge.net/ or at http://www.secviz.org/)The tool has been successfully used to visualize many types of log data.
Here we learn how to start using the tools we just discussed for taking control of your logs.Start by collecting logs; use syslog-ng or whatever syslog variant is available on your systems. To combine these with Windows logs use Snare or LASSO, which convert Windows logs to syslog.Store logs in files (compressed or not) or in a database such as open source MySQL.To start peeking at logs use search logs such as free “grep” or “splunk” that we mentioned above.When ready to move to correlation and alerting, get OSSEC or other tools. At this point, you gain a degree of awareness of what is going on in your environment.