Mining Your LogsGaining Insight Through Visualization          Raffael Marty - @zrlram               Google TechTalk March...
Raffael Marty• Founder @• Chief Security Strategist and Product Manager @ Splunk• Manager Solutions @ ArcSight• Intrusion ...
Agenda•Log Analysis                   •Future Needs•History                        •Data Visualization•Log Architectures  ...
Log Analysis10.0.20.9 - - [22/Mar/2011:10:00:52 +0000] "GET /admin/customer/customer/612/HTTP/1.1" 200 2261 "https://logdo...
History• 1980 Eric Allman develops syslogd(8)• 1996 Intellitactics• 1997 Tivoli Risk Manager developed by IBM Research  in...
History - The Other View• Network management (SNMP)• IDS false positive reduction• Security monitoring (multiple data sour...
Log Management TodayWhere are you?  Logging as a Service          © by Raffael Marty
Log Management Today                                                  less toolsDIY      Log Management   CEP and SIEM    ...
Open Source Tools• graylog2                  • lire           • MS Logparser• logstash                  • LogSurfer      •...
Commercial Tools                                              this list is likely incomplete!pixlcloud | Visualization in ...
Log Architectures            11
Log Mgmt Architecture                                             Storage:                                             - o...
Log Mgmt Architecture                  raw                               normalized                                 or raw...
Agents and Connectors  • piece of code to transport logs to a central location  • features                • often addition...
SIEM Architecture                                               asset context                raw                          ...
SIEM Architecture• RDBMS schema - Fixed number and type of fields - New data sources with new fields?  ‣   overloading• RD...
SIEM Architecture Benefits• Parsed data enables - real-time correlation - real-time statistics - data augmentation (context...
Search vs. SIEM• Full-text indexing• Parsing at search time     Example search:                    Example search:     den...
New SIEM - Hybrid Models• Use parsers for known data sources• Collect everything else• Index all data and use index for se...
Categorization and Tagging•How do you find all failed logins across any data source? security:538 OR “sshd authentication ...
Content Creation• Rules, dashboards, reports, searches can use  taxonomy: object=authentication AND action=login AND statu...
Logging as a Service (LaaS)• Economically advantageous - think about TCO• Pay as you go• Elastic infrastructure scales wit...
Loggly      Data Sources            Consumers                                                                             ...
Tool Usage                   DIY             MR       Log Mgmt             SIEM          LaaSdata          known          ...
What is Working andWhat is not?            25
What’s Working• Log collection• Log centralization• Alerting on a priori known patterns• Solving specific, known use-cases...
What’s Not Working• Log formats are all over and not documented Mar 16 08:09:58 kernel: [ 0.000000] Normal 1048576 -> 1048...
What’s Not Working• Normalization is broken: - IP to hostnames (when to do DNS lookup) - usernames (rmarty vs. ram vs. raf...
What Does It Mean?• We don’t understand our data• Security Operations Center (SOC) monitors all  corporate data sources. A...
Future Needs               30
We Need Better Tools• We will have more and more data and need to deal with  larger amounts of data - SIEM needs to suppor...
Data Visualization             32
Data/Log Visualization     • Exploration and Discovery     • Answer Questions     • Communicate Information     • Support ...
Security Visualization• We are nowhere!• Visualization is an afterthought• Sec Viz dichotomy• Tools are lacking fundamenta...
VisualizationConcepts                35
The Analysis Approach                                  Details onOverview first             Zoom                          ...
Simultaneous ViewsLogging as a Service   37     © by Raffael Marty
Dynamic ColoringLogging as a Service   38       © by Raffael Marty
Linked ViewsLogging as a Service        39        © by Raffael Marty
Legible / Usable Graphs             Reducing non data ink!Logging as a Service   40             © by Raffael Marty
Choosing the Right ChartLogging as a Service   41   © by Raffael Marty
Ode to the PieLogging as a Service     42         © by Raffael Marty
Careful With Interpretations Logging as a Service   43   © by Raffael Marty
SecViz Examples             44
Logging as a Service   45   © by Raffael Marty
Logging as a Service   46   © by Raffael Marty
Logging as a Service   47   © by Raffael Marty
Situational Awareness• Treemap• Protovis.JS• Size: Amount• Brightness: Variance• Color: Sensor• Shows: Scans -  bright spo...
Logging as a Service   49   © by Raffael Marty
Firewall TreemapLogging as a Service   50        © by Raffael Marty
Firewall Log      Port                Source IP   Destination IPLogging as a Service            51                     © b...
IDS Sig Tuning - Treemap                            Hierarchy:                              Source                        ...
Vulnerability Data by HostLogging as a Service   53   © by Raffael Marty
Visualization Future• A solution to entity extraction• Dynamic and interactive displays• Computer aided intelligence / vis...
http://secviz.org        Share, discuss, challenge, and learn about security                           visualization.• Lis...
about.me/raffy                 56
Upcoming SlideShare
Loading in...5
×

Mining Your Logs - Gaining Insight Through Visualization

7,123

Published on

In this two part presentation we will explore log analysis and log visualization. We will have a look at the history of log analysis; where log analysis stands today, what tools are available to process logs, what is working today, and more importantly, what is not working in log analysis. What will the future bring? Do our current approaches hold up under future requirements? We will discuss a number of issues and will try to figure out how we can address them.
By looking at various log analysis challenges, we will explore how visualization can help address a number of them; keeping in mind that log visualization is not just a science, but also an art. We will apply a security lens to look at a number of use-cases in the area of security visualization. From there we will discuss what else is needed in the area of visualization, where the challenges lie, and where we should continue putting our research and development efforts.

1 Comment
17 Likes
Statistics
Notes
No Downloads
Views
Total Views
7,123
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
318
Comments
1
Likes
17
Embeds 0
No embeds

No notes for slide

Mining Your Logs - Gaining Insight Through Visualization

  1. 1. Mining Your LogsGaining Insight Through Visualization Raffael Marty - @zrlram Google TechTalk March 2011
  2. 2. Raffael Marty• Founder @• Chief Security Strategist and Product Manager @ Splunk• Manager Solutions @ ArcSight• Intrusion Detection Research @ IBM Research• IT Security Consultant @ PriceWaterhouse Coopers Applied Security Visualization Publisher: Addison Wesley (August, 2008) ISBN: 0321510100 Logging as a Service 2 © by Raffael Marty
  3. 3. Agenda•Log Analysis •Future Needs•History •Data Visualization•Log Architectures •Visualization Concepts•What’s Working and •Security Visualization What’s Not? Use-Cases Logging as a Service 3 © by Raffael Marty
  4. 4. Log Analysis10.0.20.9 - - [22/Mar/2011:10:00:52 +0000] "GET /admin/customer/customer/612/HTTP/1.1" 200 2261 "https://logdog.loggly.org/admin/customer/customer/""Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-us) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"TYhzVH8AAAEAAGOkBOQAAADA 6552682010-12-28T18:12:10.031+00:00 frontend2-raffysyslog-ng[19600]: syslog-ng starting up;version=3.1.12011-01-10T21:27:04.820+00:00 frontend2-raffykernel: : [ 664.107313] blocked inboundIN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:d8:30:62:5f:6a:a3:08:00 SRC=10.0.20.109 DST=10.0.20.255LEN=180 TOS=0x00 PREC=0x00 TTL=64 ID=126PROTO=UDP SPT=17500 DPT=17500 LEN=160 Logging as a Service 4 © by Raffael Marty
  5. 5. History• 1980 Eric Allman develops syslogd(8)• 1996 Intellitactics• 1997 Tivoli Risk Manager developed by IBM Research in Zurich (later Zurich Correlation Engine, ZCE)• 1999 - 2010 A number of log management / SIEM players enter the market (software, appliances)• 2000 ArcSight - 2010 sold for $1.65bn to HP• 2009 Loggly (logging as a service) Logging as a Service 5 © by Raffael Marty
  6. 6. History - The Other View• Network management (SNMP)• IDS false positive reduction• Security monitoring (multiple data sources)• Unification of NOC and SOC (failed?)• Application monitoring (moving up the stack) - original tools failed due to architectural constraints - new approaches have been presented Logging as a Service 6 © by Raffael Marty
  7. 7. Log Management TodayWhere are you? Logging as a Service © by Raffael Marty
  8. 8. Log Management Today less toolsDIY Log Management CEP and SIEM Advanced Analytics•grep •Open source •Open source •Not log specific!•Perl •Commercial •Commercial•SQL MapReduce •Open source Logging as a Service © by Raffael Marty
  9. 9. Open Source Tools• graylog2 • lire • MS Logparser• logstash • LogSurfer • Sguil• swatch • SEC • Octopussy• tenshi • LogHound • Sagan• logwatch • slct• OSSEC • log2timeline• snare • logzilla• lasso • OSSIM this list is likely incomplete! Logging as a Service 9 © by Raffael Marty
  10. 10. Commercial Tools this list is likely incomplete!pixlcloud | Visualization in the Cloud 10 © PixlCloud LLC 2011
  11. 11. Log Architectures 11
  12. 12. Log Mgmt Architecture Storage: - on board - external storage array - clustersCollection: Processing:- syslog - indexing- OPSEC - context storage- SDEE - clustering- netflow- database Logging as a Service 12 © by Raffael Marty
  13. 13. Log Mgmt Architecture raw normalized or rawCollection: Processing: Data Access:- syslog - indexing - free-text search- OPSEC - context storage - field-based search- SDEE - clustering - tagging schemas- netflow- database Logging as a Service 13 © by Raffael Marty
  14. 14. Agents and Connectors • piece of code to transport logs to a central location • features • often additional features: • special protocols: - batch - parse - OPSEC, SDEE - compress - normalize - Windows - encrypt - aggregate • file-based collection - sign - enrichment (context) - fail-over • database collectionpixlcloud | Visualization in the Cloud 14 © PixlCloud LLC 2011
  15. 15. SIEM Architecture asset context raw normalized identity context ... context / tagging RDBMSLogging as a Service 15 © by Raffael Marty
  16. 16. SIEM Architecture• RDBMS schema - Fixed number and type of fields - New data sources with new fields? ‣ overloading• RDBMS clusters are expensive and scale poorly• Need a parser for every data source• Slow historical data queries• Hard to configure database efficiently - because of different use-cases Logging as a Service 16 © by Raffael Marty
  17. 17. SIEM Architecture Benefits• Parsed data enables - real-time correlation - real-time statistics - data augmentation (context) close to source• Unified data access language - over a fixed set of fields• Real-time dashboards Logging as a Service 17 © by Raffael Marty
  18. 18. Search vs. SIEM• Full-text indexing• Parsing at search time Example search: Example search: denied user=rmarty • use index to find • use index to find ALL occurrences of ‘denied’ occurrences of ‘rmarty’ • apply parser to results • remove results where user is not rmarty Logging as a Service 18 © by Raffael Marty
  19. 19. New SIEM - Hybrid Models• Use parsers for known data sources• Collect everything else• Index all data and use index for search• Correlate parsed data Logging as a Service 19 © by Raffael Marty
  20. 20. Categorization and Tagging•How do you find all failed logins across any data source? security:538 OR “sshd authentication failure” OR “sshd failed password” OR ...•Does not scale - for new data sources - for new events of existing sources id -> object, action, status•Define a ‘taxonomy’ for all events•Map events into taxonomy Logging as a Service 20 © by Raffael Marty
  21. 21. Content Creation• Rules, dashboards, reports, searches can use taxonomy: object=authentication AND action=login AND status=success• All failures related to files: object=file AND status=failure • Approach scales well• Mixing with other fields: • Huge effort to build and action=login AND user=rmarty maintain mappings Logging as a Service 21 © by Raffael Marty
  22. 22. Logging as a Service (LaaS)• Economically advantageous - think about TCO• Pay as you go• Elastic infrastructure scales with your needs• No installation needed• No setup costs / time for logging solution• Open platform with RESTful APIs Logging as a Service 22
  23. 23. Loggly Data Sources Consumers Loggly user interface UI extensions mobile-166 My syslog Data collection Proxies API Data access Distributed Indexers and Search Machines indexing and processingLog Archive Distributed data store Logging as a Service 23
  24. 24. Tool Usage DIY MR Log Mgmt SIEM LaaSdata known known unknown known -sources only a few only a few many manyanalysis known exploration unknown unknown extenduse-cases one or a few large-scale many many platformdynamic no no yes yes yesuse-casesreal-time extend no no no yescorrelation platform engineer engineers license licensecost hardware hardware (hardware) hardware subscription maintenance maintenance maintenance maintenance Should you rather do it yourself (DIY)? Logging as a Service 24 © by Raffael Marty
  25. 25. What is Working andWhat is not? 25
  26. 26. What’s Working• Log collection• Log centralization• Alerting on a priori known patterns• Solving specific, known use-cases for sets of known data sources, e.g., - monitoring privileged access to financial servers - generating compliance reports - security forensics Logging as a Service 26 © by Raffael Marty
  27. 27. What’s Not Working• Log formats are all over and not documented Mar 16 08:09:58 kernel: [ 0.000000] Normal 1048576 -> 1048576• No logging guidelines / developer education• Parsing is broken - based on regexes - numerous mistakes - doesn’t scale Logging as a Service 27 © by Raffael Marty
  28. 28. What’s Not Working• Normalization is broken: - IP to hostnames (when to do DNS lookup) - usernames (rmarty vs. ram vs. raffy)• Categorization / Taxonomy - doesn’t scale - is always out of date - is buggy - expensive• Prioritization has no working formula• Anomaly detection is voodoo! Logging as a Service 28 © by Raffael Marty
  29. 29. What Does It Mean?• We don’t understand our data• Security Operations Center (SOC) monitors all corporate data sources. Analysts - don’t know all the applications - don’t know all the setups - don’t know what log records are ‘normal’ behavior --> Need tools to enable log owners to work with their data Logging as a Service 29 © by Raffael Marty
  30. 30. Future Needs 30
  31. 31. We Need Better Tools• We will have more and more data and need to deal with larger amounts of data - SIEM needs to support new distributed, scalable data management technologies• More and more application layer data - How are we going to deal with all the parsing / entity extraction? - We need logging standards and guidelines• How do we help analysts understand the data? - What is important and what is not? - Mapping problems to business process, business risk! Logging as a Service 31 © by Raffael Marty
  32. 32. Data Visualization 32
  33. 33. Data/Log Visualization • Exploration and Discovery • Answer Questions • Communicate Information • Support DecisionsLogging as a Service 33 © by Raffael Marty
  34. 34. Security Visualization• We are nowhere!• Visualization is an afterthought• Sec Viz dichotomy• Tools are lacking fundamental capabilities• Users don’t understand data, how can they understand visuals? Logging as a Service 34 © by Raffael Marty
  35. 35. VisualizationConcepts 35
  36. 36. The Analysis Approach Details onOverview first Zoom demand Principle by Ben Shneiderman Logging as a Service 36 © by Raffael Marty
  37. 37. Simultaneous ViewsLogging as a Service 37 © by Raffael Marty
  38. 38. Dynamic ColoringLogging as a Service 38 © by Raffael Marty
  39. 39. Linked ViewsLogging as a Service 39 © by Raffael Marty
  40. 40. Legible / Usable Graphs Reducing non data ink!Logging as a Service 40 © by Raffael Marty
  41. 41. Choosing the Right ChartLogging as a Service 41 © by Raffael Marty
  42. 42. Ode to the PieLogging as a Service 42 © by Raffael Marty
  43. 43. Careful With Interpretations Logging as a Service 43 © by Raffael Marty
  44. 44. SecViz Examples 44
  45. 45. Logging as a Service 45 © by Raffael Marty
  46. 46. Logging as a Service 46 © by Raffael Marty
  47. 47. Logging as a Service 47 © by Raffael Marty
  48. 48. Situational Awareness• Treemap• Protovis.JS• Size: Amount• Brightness: Variance• Color: Sensor• Shows: Scans - bright spots• Thanks to Chris Horsley Logging as a Service 48 © by Raffael Marty
  49. 49. Logging as a Service 49 © by Raffael Marty
  50. 50. Firewall TreemapLogging as a Service 50 © by Raffael Marty
  51. 51. Firewall Log Port Source IP Destination IPLogging as a Service 51 © by Raffael Marty
  52. 52. IDS Sig Tuning - Treemap Hierarchy: Source Destination Signature Number of Events Color: Priority Size: Number of alertsLogging as a Service 52 © by Raffael Marty
  53. 53. Vulnerability Data by HostLogging as a Service 53 © by Raffael Marty
  54. 54. Visualization Future• A solution to entity extraction• Dynamic and interactive displays• Computer aided intelligence / visualization - Computer supported exploration - Highly interactive• Expert system that captures domain knowledge - Collaborative Logging as a Service 54 © by Raffael Marty
  55. 55. http://secviz.org Share, discuss, challenge, and learn about security visualization.• List: secviz.org/mailinglist• Twitter: @secviz Logging as a Service 55 © by Raffael Marty
  56. 56. about.me/raffy 56
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×