Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rsyslog log normalization


Published on

Learn about structured logging with rsyslog and how it can be used to do actual format conversions. Include config samples for Linux and Windows log sources.

Published in: Technology, Education

Rsyslog log normalization

  1. 1. Log Message Processing,Formatting and Normalizingwith RsyslogRainer Gerhards
  2. 2. Rainer Gerhards, http://blog.gerhards.netWhats in this talk?• Some Logging Basics• A practical Usage Scenario• Logging APIs• Background information on rsyslog processing
  3. 3. Rainer Gerhards, http://blog.gerhards.netWhy Logging?• Troubleshooting• Security Alerting (e.g. SIEM)• Legal Requirements (e.g. banks)• Evidence in Court• Billing (e.g. Telecom Industry)
  4. 4. Rainer Gerhards, http://blog.gerhards.netLogging is simple, isnt it?• Just generate a log record when somethinginteresting happens• BUT▫ What is “interesting”?▫ What is required to describe the event?▫ How do we know what the actual data item means?▫ What does a log record look like?• So... making sense out of logs, especially in aheterogeneous environment, is far from beingsimple...
  5. 5. Rainer Gerhards, http://blog.gerhards.netThe Logging Dilemma• There is no universally accepted format• Logs looking very much the same describe differentevents• The same event is described in very different-looking log records• Often, pseudo-free-form text is used• For consumers, it is very hard to digest even adecent subset of important logging formats
  6. 6. Rainer Gerhards, http://blog.gerhards.netIts a real-world problem!One day in my mailbox...“I am working with a customer who is deploying alarge rsyslog environment for central logging.Basically they want a cluster of boxes to act as the"log of record". They would also like to have thelogs fed to a couple security products foranalysis. The customer has a limited budget sohaving each vendor write parsers is costprohibitive. ”
  7. 7. Rainer Gerhards, http://blog.gerhards.netLog Producers & ConsumersLinux Boxes WindowsOther *nix FirewallsAppsSecurityAnalyzer ILogStorageSecurityAnalyzer nCapacityPlanningBilling?
  8. 8. Rainer Gerhards, http://blog.gerhards.netSome important log sources• Free-form text formats▫ Traditional syslog messages▫ Application text log files• Structured formats▫ Windows Event Log▫ Linux Journal (today mostly text messages)▫ Application text log files (XML, CSV, WELF, ApacheCLF, whatever)▫ SNMP traps▫ New-style syslog
  9. 9. Rainer Gerhards, http://blog.gerhards.netHow to solve that dilemma?• Several efforts try very hard to solve this▫ For many years▫ With limited success• Resulted in approach named“Common Event Expression” (CEE)▫ Cross vendor team (both OSS & commercial)▫ Driven by US MITRE▫ Build on existing infrastructure
  10. 10. Rainer Gerhards,
  11. 11. Rainer Gerhards, http://blog.gerhards.netCEEs core ideas• Keep it simple & extensible• Support existing technology• As far as the format is concerned▫ name/value pairs▫ Keep the structure as flat as possible, but permit somehierarchy▫ Keep dictionaries of field names, syntax and semantic▫ Profiles specify what needs to be present in specificevent types
  12. 12. Rainer Gerhards, http://blog.gerhards.netProject Lumberjack• Born on last years Fedora DevConf, right here!• Intends to▫ Build on CEE and drive the ideas further▫ Provide open source implementation of corefunctionality▫ Deliver something that actually works• Driven by Logging Professionals from Red Hat,Balabit (syslog-ng) and Adiscon (rsyslog), open toeveryone else
  13. 13. Rainer Gerhards, http://blog.gerhards.netWhat did we do the past year?• Agree on the log format• Made rsyslog fully lumberjack-aware• Made Adiscons Windows Products fullylumberjack-aware• Made syslog-ng fully lumberjack-aware• Create new syslog API --> libumberlog
  14. 14. Rainer Gerhards, http://blog.gerhards.netBack to my mailbox...“I am working with a customer who is deploying alarge rsyslog environment for central logging.Basically they want a cluster of boxes to act as the"log of record". They would also like to have the logsfed to a couple security products for analysis. Thecustomer has a limited budget so having each vendorwrite parsers is cost prohibitive. A commonalityfor each of the additional destinations is theability to ingest logs in <some commonformat>. I believe rsyslog has the capability to alterthe output...”
  15. 15. Rainer Gerhards, http://blog.gerhards.netRsyslog as converterrsyslogdLinux Boxes WindowsOther *nix FirewallsAppsSecurityAnalyzer ILogStorageSecurityAnalyzer nCapacityPlanningBilling
  16. 16. Rainer Gerhards, http://blog.gerhards.netSome rsyslog basics• Ruleset▫ Like a function in a programming language▫ Consists of (conditional) statements and actions▫ Can be called from another ruleset or bound to alistener• Variables▫ Message Variables (e.g. $msg, $rawmsg)▫ System Variables (e.g. $$now)▫ Structured Variables: form a tree-like structure, e.g. $!usr!somevar
  17. 17. Rainer Gerhards, http://blog.gerhards.netLets look at a practical case• Goal: Unified log files with logon/logoff report▫ For processing by backend tools (not shown)▫ concentrate on just four fields: host system, receptiontime, username, logon/logoff status• Inputs▫ Linux: traditional text log messages▫ Windows: different Agents• Output▫ Lumberjack JSON style▫ CSV
  18. 18. Rainer Gerhards, http://blog.gerhards.netHave rsyslog gather the datamodule(load="imtcp")/* We assume to have all TCP logging (for simplicity)* Note that we use different ports to point different sources* to the right rule sets for normalization. While there are* other methods (e.g. based on tag or source), using multiple* ports is both the easiest as well as the fastest.*/input(type="imtcp" port="13514" Ruleset="WindowsRsyslog")input(type="imtcp" port="13515" Ruleset="LinuxPlainText")input(type="imtcp" port="13516" Ruleset="WindowsSnare")
  19. 19. Rainer Gerhards, http://blog.gerhards.netThe Linux Input Data sample• Free-text formatJan 16 09:28:33 rger-virtual-machine sudo: pam_unix(sudo:session): session openedfor user root by rger(uid=1000)Jan 16 09:28:33 rger-virtual-machine sudo: pam_unix(sudo:session): session closedfor user rootJan 24 02:38:49 rger-virtual-machine sshd[2414]: pam_unix(sshd:session): sessionopened for user rger by (uid=0)Jan 24 02:41:22 rger-virtual-machine sshd[2414]: pam_unix(sshd:session): sessionclosed for user rger• Free-text format
  20. 20. Rainer Gerhards, http://blog.gerhards.netParsing Free-Text Messages:mmnormalize• Uses a “sample rule base”▫ One sample for each expected message type▫ Sample contains text (for matching) and propertydescriptions (like IPv4 Address, char-matches, …)▫ If sample matches, corresponding properties areextracted▫ Special parser for iptables• Also implemented as an action• Very fast algorithm (much faster than regex)• Based on liblognorm (which you can use in yourown programs to gain this functionality!)
  21. 21. Rainer Gerhards, http://blog.gerhards.netNeeds to be normalized• Job for rsyslogs mmnormalize• rulebase:# SSH and sudo loginsprefix=%rcvdat:date-rfc3164% %rcvdfrom:word%rule=: sshd[%-:number%]: pam_unix(sshd:session): session %type:word% for user%user:word% by (uid=%-:number%)rule=: sshd[%-:number%]: pam_unix(sshd:session): session %type:word% for user%user:word%rule=: sudo: pam_unix(sudo:session): session %type:word% for user rootby %user:char-to:(%(uid=%-:number%)rule=: sudo: pam_unix(sudo:session): session %type:word% for user %user:word%
  22. 22. Rainer Gerhards, http://blog.gerhards.netPutting it all together:/* plain Linux log messages (here: ssh and sudo) need to be* parsed - we use mmnormalize for fast and efficient parsing* here.*/ruleset(name="LinuxPlainText") {action(type="mmnormalize"rulebase="/home/rger/proj/rsyslog/linux.rb" userawmsg="on")if $parsesuccess == "OK" and $!user != "" then {if $!type == "opened" thenset $!usr!type = "logon";else if $!type == "closed" thenset $!usr!type = "logoff";set $!usr!rcvdfrom = $!rcvdfrom;set $!usr!rcvdat = $!rcvdat;set $!usr!user = $!user;call outwriter}}
  23. 23. Rainer Gerhards, http://blog.gerhards.netWindows Horrors: SNARE• Tab-delimited mess:<131>Feb 10 15:48:12 Win2008StdR2x64_vmMSWinEventLog#0111#011Security#0114#011Tue Feb 05 16:39:272013#0114624#011Microsoft-Windows-Security-Auditing#011WIN2008STDR2X64Administrator#011N/A#011SuccessAudit#011Win2008StdR2x64_vm#011Anmelden#011#011Ein Konto wurde erfolgreichangemeldet. Antragsteller: Sicherheits-ID: S-1-5-18 Kontoname:WIN2008STDR2X64$ Kontodomäne: WORKGROUP Anmelde-ID: 0x3e7Anmeldetyp: 2 Neue Anmeldung: Sicherheits-ID: S-1-5-21-3148105976-3029560809-1855765213-500 Kontoname: Administrator Kontodomäne: WIN2008STDR2X64Anmelde-ID: 0x1d1feb Anmelde-GUID: {00000000-0000-0000-0000-000000000000} Prozessinformationen: Prozess-ID: 0xc40 Prozessname:C:WindowsSystem32winlogon.exe Netzwerkinformationen: Arbeitsstationsname:WIN2008STDR2X64 Quellnetzwerkadresse: Quellport: 0 DetaillierteAuthentifizierungsinformationen: Anmeldeprozess: User32 Authentifizierungspaket:Negotiate Übertragene Dienste: - Paketname (nur NTLM): - Schlüssellänge: 0 DiesesEreignis wird beim Erstellen einer Anmeldesitzung generiert. Es wird auf dem Computer
  24. 24. Rainer Gerhards, http://blog.gerhards.netAnyhow... digest by position:ruleset(name="WindowsSnare") {set $!usr!type = field($rawmsg, "#011", 6);if $!usr!type == 4634 then {set $!usr!type = "logoff"; set $!doProces = 1;} else if $!usr!type == 4624 then {set $!usr!type = "logon"; set $!doProces = 1;} else set $!doProces = 0;if $!doProces == 1 then {set $!usr!rcvdfrom = field($rawmsg, 32, 4);set $!usr!rcvdat = field($rawmsg, "#011", 5);/* we need to fix up the snare date */set $!usr!rcvdat = field($!usr!rcvdat, 32, 2) & " " &field($!usr!rcvdat, 32, 3) & " " &field($!usr!rcvdat, 32, 4);set $!usr!user = field($rawmsg, "#011", 8);call outwriter }}
  25. 25. Rainer Gerhards, http://blog.gerhards.netWindows: rsyslog Agent• Native Lumberjack format with Windows fieldnames• A structured mess ;-)<133>Feb 05 11:15:56 EvntSLog: @cee: {"source":"", "nteventlogtype": "Security", "sourceproc": "Microsoft-Windows-Security-Auditing", "id": "4634", "categoryid": "12545", "category": "12545","keywordid": "0x8020000000000000", "user": "NA", "TargetUserSid": "S-1-5-21-803433813-209592097-1264475144-8733", "TargetUserName": "fr","TargetDomainName": "ADISCON", "TargetLogonId": "0xb8c7aed", "LogonType":"7", "catname": "Logoff", "keyword": "Audit Success", "level": "Information", "msg":"An account was logged off.rnrnSubject:rntSecurity ID:ttS-1-5-21-803433813-209592097-1264475144-8733rntAccount Name:ttfrrntAccountDomain:ttADISCONrntLogon ID:tt0xb8c7aedrnrnLogonType:ttt7rnrnThis event is generated when a logon session is destroyed. It maybe positively correlated with a logon event using the Logon ID value. Logon IDs areonly unique between reboots on the same computer."}
  26. 26. Rainer Gerhards, http://blog.gerhards.netParsing Lumberjack Data:mmjsonparse• Checks if message contains Lumberjack structureddata▫ If so parse out fields Use field names directly from the message▫ If not: populate Lumberjack msg field• Implemented via action interface▫ Can be called based on rules, thus only for specificevents
  27. 27. Rainer Gerhards, http://blog.gerhards.netReading the Lumberjack Data:/* the rsyslog Windows Agent uses native Lumberjack format* (better said: is configured to use it)*/ruleset(name="WindowsRsyslog") {action(type="mmjsonparse")if $parsesuccess == "OK" then {if $!id == 4634 thenset $!usr!type = "logoff";else if $!id == 4624 thenset $!usr!type = "logon";set $!usr!rcvdfrom = $!source;set $!usr!rcvdat = $timereported;set $!usr!user = $!TargetDomainName &"" & $!TargetUserName;call outwriter}}
  28. 28. Rainer Gerhards, http://blog.gerhards.netWhat did we do so far?• We accepted input from three different sources▫ Free-form text▫ Tab-delimited semi-structured▫ Native Lumberjack• We extracted the same information items from thesemessages• And stored these inside the $!usr branch variables
  29. 29. Rainer Gerhards, http://blog.gerhards.netSo we now need to write thenormalized output!/* this ruleset simulates forwarding to the final destination */ruleset(name="outwriter"){action(type="omfile"file="/home/rger/proj/rsyslog/logfile.csv" template="csv")action(type="omfile"file="/home/rger/proj/rsyslog/logfile.cee" template="cee")}
  30. 30. Rainer Gerhards, http://blog.gerhards.netTemplates do the actual worktemplate(name="csv" type="list") {property(name="$!usr!rcvdat" format="csv")constant(value=",")property(name="$!usr!rcvdfrom" format="csv")constant(value=",")property(name="$!usr!user" format="csv")constant(value=",")property(name="$!usr!type" format="csv")constant(value="n")}template(name="cee" type="string"string="@cee: %$!usr%n")
  31. 31. Rainer Gerhards, http://blog.gerhards.netAnd this is a combined CEE outputfile:@cee: { "type": "logon", "rcvdfrom": "rger-virtual-machine", "rcvdat": "Jan 16 09:28:33","user": "root" }@cee: { "type": "logoff", "rcvdfrom": "rger-virtual-machine", "rcvdat": "Jan 16 09:28:33","user": "root" }@cee: { "type": "logon", "rcvdfrom": "Win2008StdR2x64_vm", "rcvdat": "Feb 0516:39:27", "user": "WIN2008STDR2X64Administrator" }@cee: { "type": "logoff", "rcvdfrom": "WIN-VSBQP2NOITT", "rcvdat": "Jan 25 15:44:35","user": "WIN-VSBQP2NOITTte" }@cee: { "type": "logoff", "rcvdfrom": "", "rcvdat": "Feb 511:15:56", "user": "ADISCONfr" }@cee: { "type": "logon", "rcvdfrom": "", "rcvdat": "Feb 513:41:28", "user": "NT AUTHORITYSYSTEM" }
  32. 32. Rainer Gerhards, http://blog.gerhards.netAnd the same in CSV:"Jan 16 09:28:33","rger-virtual-machine","root","logon""Jan 16 09:28:33","rger-virtual-machine","root","logoff""Jan 24 02:38:49","rger-virtual-machine","rger","logon""Feb 05 16:39:27","Win2008StdR2x64_vm","WIN2008STDR2X64Administrator","logon""Jan 25 15:44:35","WIN-VSBQP2NOITT","WIN-VSBQP2NOITTte","logoff""Feb 5 11:15:56","","ADISCONfr","logoff""Feb 5 13:41:28","","NT AUTHORITYSYSTEM","logon"
  33. 33. Rainer Gerhards, http://blog.gerhards.netOf course, this is just a smallexample, but• It shows how all the pieces can be put together• mmnormalize is a very important building block tointegrate free-form text logs, no matter what thesource is• The output format is highly flexible• Of course, structured outputs like MongoDB orElasticsearch are also supported• We can emit almost all output formats, new onesrequires relatively little work in rsyslogs engine
  34. 34. Rainer Gerhards, http://blog.gerhards.netBottom line• Rsyslog can act today as an universal log formattranslator• We hope that consumer tools will make use of thesimple-to-process lumberjack format• HOWEVER, we can already convert into whattodays real-world analysis tools can digest
  35. 35. Rainer Gerhards, http://blog.gerhards.netOnce again back to my inbox...• “I know this is asking a lot since rsyslog wouldhave to do a bunch of processing. I also understandthere may be a delay in log delivery due to theprocessing.”• Well … actually its far from being as bad asdescribed:▫ Structured logs are ingested very quickly▫ Liblognorm/mmnormalize is extremely fast inconverting classical text logs▫ Reformatting is done always in any case, so... ;-)
  36. 36. Rainer Gerhards, http://blog.gerhards.netLong-Term Vision• There NEVER will be a single format▫ Political reasons (vendors, projects, history, ...)▫ Need for new features/functionality• BUT: use as few as possible▫ Less hassle for producer and consumer devs▫ Forces closed source vendors to support thesestandard, making it easier for the OSS guys▫ Big win for Enterprise folks who get plug&play• We hope that Lumberjack will be dominant▫ Stack already in place▫ Good & simple solution▫ Rsyslog converts everything running on Linux
  37. 37. Rainer Gerhards, http://blog.gerhards.netQuestions?• Please direct them to the rsyslog mailing list• Listinfo: