Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Mining with Splunk

17,216 views

Published on

Published in: Technology
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • It's from 2012, right?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Data Mining with Splunk

  1. 1. Data Mining and Exploration David Carasso, Office of CTO, Chief Mind
  2. 2. AGENDAWhat is data mining?What’s the plan of attack?What type of events do I have?How do I mine fields?How do I to detect anomalous events?Why do I need to visualize my data?
  3. 3. What is Data Mining? 3
  4. 4. Is this data mining?This is an orange 4
  5. 5. What is Data Mining?Extracting implicit, previously unknown, andpotentially useful information from data. 5
  6. 6. Better 6
  7. 7. Data Preparation UnderstandingData ExplorationData Mining 7
  8. 8. What’s the plan of attack? 8
  9. 9. Preparing the dataYouve been thrown data you arent familiar with…Mar 7 12:40:01 willLaptop crond(pam_unix)[10696]: session opened for user root by (uid=0)Mar 7 12:40:01 willLaptop crond(pam_unix)[10695]: session closed for user rootMar 7 12:40:02 willLaptop crond(pam_unix)[10696]: session closed for user rootMar 7 12:44:47 willLaptop gconfd (root-10750): starting (version 2.10.0), pid 10750 userrootMar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address"xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only config...Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readwrite:/root/.gconf”…Mar 7 12:45:01 willLaptop crond(pam_unix)[10754]: session opened for user root by (uid=0)Mar 7 12:45:02 willLaptop crond(pam_unix)[10754]: session closed for user root.... Eventtypes Fields Transactions Anomalies (closed sessions) (pid) (open-close) (unexpected address) 9
  10. 10. Is Understanding Linear? Event Groups Events reports Anomalies Fields No. 10
  11. 11. What type of events do I have? 11
  12. 12. Given Some Unknown DataMar 7 12:40:01 willLaptop crond(pam_unix)[10696]: session opened for user root by (uid=0)Mar 7 12:40:01 willLaptop crond(pam_unix)[10695]: session closed for user rootMar 7 12:40:02 willLaptop crond(pam_unix)[10696]: session closed for user rootMar 7 12:44:47 willLaptop gconfd (root-10750): starting (version 2.10.0), pid 10750 userrootMar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address"xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only config...Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readwrite:/root/.gconf”…Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address"xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration ...Mar 7 12:45:01 willLaptop crond(pam_unix)[10754]: session opened for user root by (uid=0)Mar 7 12:45:02 willLaptop crond(pam_unix)[10754]: session closed for user root.... 12
  13. 13. Find Broad Categories of EventsGroup Events by Content, Format, and Time 13
  14. 14. Group Events by ContentCluster events with similar values.Show 3 examples from each cluster, from the mostcommon cluster to the least:…| cluster labelonly=t showcount=t | dedup 3 cluster_label sortby -cluster_count, cluster_label, _time 14
  15. 15. Events By Contentcount label _raw--------------------------------------------------------------------------------------------------------- 1339 3 Mar 7 11:05:01 willLaptop crond(pam_unix)[6785]: session opened for user root by… 1339 3 Mar 7 11:10:01 willLaptop crond(pam_unix)[1769]: session opened for user root by … 1339 3 Mar 7 11:10:01 willLaptop crond(pam_unix)[1766]: session opened for user root by … 1324 2 Mar 7 11:05:02 willLaptop crond(pam_unix)[6785]: session closed for user root 1324 2 Mar 7 11:10:01 willLaptop crond(pam_unix)[1766]: session closed for user root 1324 2 Mar 7 11:10:02 willLaptop crond(pam_unix)[1769]: session closed for user root 136 13 Mar 7 20:05:08 willLaptop kernel: SELinux: initialized (dev selinuxfs, typeselinuxfs)… 136 13 Mar 7 20:05:09 willLaptop kernel: SELinux: initialized (dev usbfs, type usbfs), uses … 136 13 Mar 7 20:05:09 willLaptop kernel: SELinux: initialized (dev sysfs, type sysfs), uses … 15
  16. 16. Group by $%#! FormatCluster events by first 7 punctuation chars:…| rex field=punct "(?<smallpunct>.{7})” | eventstats count by smallpunct | sort -count, smallpunct | dedup 3 smallpunct 16
  17. 17. Events by Formatcount smallpunct raw------------------------------------------------------------------------------------------------ 637 __::__( Mar 10 16:50:02 willLaptop crond(pam_unix)[9639]: session closed for user root 637 __::__( Mar 10 16:50:01 willLaptop crond(pam_unix)[9638]: session closed for user root 637 __::__( Mar 10 16:50:01 willLaptop crond(pam_unix)[9639]: session opened for user root by …367 __::__: Mar 10 15:30:25 willLaptop dhclient: bound to 10.1.1.194 -- renewal in 5788 seconds.367 __::__: Mar 10 15:30:25 willLaptop dhclient: DHCPACK from 10.1.1.50367 __::__: Mar 10 15:30:25 willLaptop dhclient: DHCPREQUEST on eth0 to 10.1.1.50 port 67 57 __::__[ Mar 10 16:46:32 willLaptop ntpd[2544]: synchronized to 138.23.180.126, stratum 2 57 __::__[ Mar 10 16:46:27 willLaptop ntpd[2544]: synchronized to LOCAL(0), stratum 10 57 __::__[ Mar 10 16:42:09 willLaptop ntpd[2544]: time reset -0.236567 s 17
  18. 18. Group by TimeLook for bursts of events • Turn on computer • Load a web page • Detects speeding car • Print document • Scan security badge 18
  19. 19. Group by Time Bursts… | transaction maxpause=2s | search eventcount>1Mar 10 16:50:01 willLaptop crond(pam_unix)[9638]: session opened for user root by (uid=0)Mar 10 16:50:01 willLaptop crond(pam_unix)[9639]: session opened for user root by (uid=0)Mar 10 16:50:01 willLaptop crond(pam_unix)[9638]: session closed for user rootMar 10 16:50:02 willLaptop crond(pam_unix)[9639]: session closed for user rootMar 10 15:30:25 willLaptop dhclient: DHCPREQUEST on eth0 to 10.1.1.50 port 67Mar 10 15:30:25 willLaptop dhclient: DHCPACK from 10.1.1.50Mar 10 15:30:25 willLaptop dhclient: bound to 10.1.1.194 -- renewal in 5788 seconds.Mar 10 16:45:01 willLaptop crond(pam_unix)[9553]: session opened for user root by (uid=0)Mar 10 16:45:02 willLaptop crond(pam_unix)[9553]: session closed for user root 19
  20. 20. Multiple Sources (not really correct) 20
  21. 21. Now what?1. ✓ group your data2. tell splunk! 21
  22. 22. Telling Splunk(about your groups of events)Add eventtypes and tags Huh? 22
  23. 23. SURPRISE TANGENT!What is an eventtype? 23
  24. 24. EventtypeA dynamic “tag” added to events, if they wouldmatch the search that defines the eventtype. 24
  25. 25. Eventtype: Name: “closed_root” Definition: “session closed” rootEvent: … session closed for user root … => eventtype=closed_root 25
  26. 26. Create an Eventtype 26
  27. 27. Independent searches will return events taggedwith previous eventtypes that help classify events. 27
  28. 28. Create reports on the classifications you’ve made Ok, it wasn’t a tangent. 28
  29. 29. How do I mine fields? 29
  30. 30. Fields CorrelationDiscover correlations to remove uninterestingfields and narrow in on promising reports. haiku 30
  31. 31. Fields Correlation HaikuDiscover patternsin fields with a correlation:co-occurring fields. indulgence 31
  32. 32. Splunkd.log Sample File09-05-2012 15:34:11.886 -0700 INFO ExecProcessor - Ran script: python /opt/splunk/etc/apps/...09-05-2012 15:34:02.467 -0700 ERROR TcpOutputProc - Cant find or illegal IP address or ...09-05-2012 15:32:03.397 -0700 INFO ProcessTracker - Process ran long; type=SplunkOptimize ...09-05-2012 15:30:20.016 -0700 WARN DispatchCommand - The system is approaching the maximum ... fascinating 32
  33. 33. Field Correlation… | correlateRowField C CN Component Context L ...------------------------ ---- ---- --------- ------- ----C 1.00 1.00 0.00 0.00 1.00CN 1.00 1.00 0.00 0.00 1.00Component 0.00 0.00 1.00 0.06 0.00Context 0.00 0.00 0.06 1.00 0.00L 1.00 1.00 0.00 0.00 1.00Log_Level 0.00 0.00 1.00 0.06 0.00… 33
  34. 34. Field Associationsautomatically deduce correlations andimplications of field values:…| associate Log_Level Component 34
  35. 35. Field Association Summary Uncond CondRef_Key Ref_Value Target_Key Support Entropy Entropy Increase Top_Conditional_Value--------- ------------------------ ---------- ------- ------- ------- -------- ------------------------Component DatabaseDirectoryManager Log_Level 34.67% 1.182 0.000 1.182201 WARN (62.25% -> 100.00%)Component HotDBManager Log_Level 38.25% 1.182 0.000 1.182201 INFO (33.15% -> 100.00%)Component SavedSplunker Log_Level 394.31% 1.182 0.000 1.182201 WARN (62.25% -> 100.00%)Component databasePartitionPolicy Log_Level 95.50% 1.182 0.417 0.765017 INFO (33.15% -> 91.57%)Component loader Log_Level 79.17% 1.182 0.050 1.131883 INFO (33.15% -> 99.44%)Component timeinvertedIndex Log_Level 44.28% 1.182 0.000 1.182201 INFO (33.15% -> 100.00%) 35
  36. 36. Top Fields by FieldsMost common Log_Level by Component: ... | top Log_Level by ComponentComponent Log_Level count percent---------------------------------- --------- ----- ----------AdminManager WARN 1 100.000000DatabaseDirectoryManager WARN 153 100.000000DateParserVerbose WARN 262 100.000000DedupProcessor ERROR 1 100.000000DeploymentClient DEBUG 60 85.714286DeploymentClient WARN 5 7.142857 36
  37. 37. How do I to detect anomalous events? 37
  38. 38. Types of AnomaliesAnomalies you know aboutAnomalies you don’t know about 38
  39. 39. Handling Known Anomalies.Easy. Define a search for the anomalous conditionand make an alert to detect it.ip=10.* NOT domain=mycompany.com… | stats perc99(spent)  500ms. Alert on “spent>500” 39
  40. 40. Finding Unknown AnomaliesLook for Abnormal• Single-Field Values• Multi-Field Values• Contexts• Visual Inspections… 40
  41. 41. Anomalies by Single Field ValuesIdentify anomalous values in a given field either byfrequency of occurrence or number of standarddeviations from the mean.… | anomalousvalue action=summary pthresh=0.02 | search isNum=YES 41
  42. 42. Anomalies by Single Field Values 42
  43. 43. Anomalous by Many ValuesLook for small clusters – by content, format, andtime – to find anomalies. For example……| cluster …| sort cluster_count 43
  44. 44. Smallest Clusters by Contentcount label uri1 7 /img/skins/default/bolt.png1 37 /en-US/search/inspector?sid=1345075042.125&namespace=search1 45 /services/admin/summarization?count=101 53 /services/pdfgen/is_available?viewId=index_status_health&...1 57 /static/splunkrc_cmds.xml 44
  45. 45. Small Clusters: Bursts of OneFind bursts of just a single events where a pause of 2 secondsoccurred around it.… |transaction maxpause=2s | search eventcount = 1Mar 10 16:46:32 willLaptop ntpd[2544]: synchronized to 138.23.180.126…Mar 10 16:46:27 willLaptop ntpd[2544]: synchronized to LOCAL(0), stratum…Mar 10 16:42:09 willLaptop ntpd[2544]: time reset -0.236567… 45
  46. 46. Burst of OneSame idea, different data source: splunk[11:58:08] "POST /services/search/jobs/export HTTP/1.1" 200 201630 …[11:12:51] "POST /services/search/jobs/export HTTP/1.1" 200 459441 …[10:00:58] "GET /servicesNS/nobody/SplunkDeploymentMonitor/backfill/… 46
  47. 47. Anomalous by ContextIdentify values not expected by the context of otherevents.… | anomalies field=file labelonly=true maxvalues=10 47
  48. 48. Anomalous by Context Unexpectedness file 0.00 shelper 0.16 shelper 0.00 1345502591.356 0.00 1345502591.356 0.00 1345074401.191 0.00 1345074031.153 time 0.03 1345074328.186 0.00 1345502591.356 0.35 conf-dm_backfill 0.00 1345074309.185 0.00 1345502591.356 48
  49. 49. Surprise Eventtype: Part Deux!Classified major categories of your data witheventtypes?-- just search for things that don’t match thoseeventtypes 49
  50. 50. 50
  51. 51. Once you can describe anomalous behavior as a search… 51
  52. 52. 52
  53. 53. Other mining commands• kmeans: Performs k-means clustering on selected fields.• outlier: Removes outlying numerical values.• af (analyze fields): Analyzes numerical fields for their ability to predict another discrete field• fieldsummary : Generates summary information fields.• shape: Produces a symbolic shape attribute describing the shape of a numeric multivalued field 53
  54. 54. Why do I need to visualize my data? 54
  55. 55. Data Mining by VisualizationVisualization can capture nuances in the data thatnumerical or linguistic summaries cannot easily capture. 55
  56. 56. These data points are radically different. *Source: Anscombe’s Quartet (Anscombe 1973) 56
  57. 57. Why visualize?Because they all have the exact same • average (7.50) • standard deviation (2.03) • least-squares fit (3 + 0.5x).Do not just rely on numerical summarization. 57
  58. 58. But I already have charts!You don’t graph enough.Data Exploration Don’t decide ahead of time what graphs you want Regularly do out-of-the-box scenarios with graphs 58
  59. 59. Data ExplorationVariations:• Subsets of Events (paying customers vs lookers)• Fields by Fields (including eventtypes and tags)• Ignored fields• Min/max/avg/count• Compare to other times windows• Transactions 59
  60. 60. Visual ArrangementSorting data, Changing Scales(Linear/Log), Min/Max can have a huge differenceon looking at the same data. 60
  61. 61. Visual Considerations Pick representations that make obvious the distinctions you need to care about. 61
  62. 62. Summary 62
  63. 63. Summary• Discovery is an iterative process.• Group events by content, format, and time, and define classifications with eventtypes and tags• Focus on promising fields with correlations• Discover unknown anomalies with small clusters.• Visualize your data, from a dozen angles. 63
  64. 64. But wait! 64
  65. 65. More to come: Predictive Analytics… | forecast foo 65
  66. 66. The End Mine the Gap..,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`....,`......_.,`...,`...,`...,`...,`...,`...,`...,`...,`....._.....___..|.|...__._..._.__.,`..._.__.,`..___...__.,`...__.|.|.../.__|.|.|../._`.|.|._.....|._..../._...../././.|.|..|.(__..|.|.|.(_|.|.|.|_).|...|.|.|.|.|.(_).|...V..V./..|_|...___|.|_|..__,_|.|..__/....|_|.|_|..___/...._/_/...(_)..,`...,`...,`...,`..|_|.,`...,`...,`...,`...,`...,`...,`..... Golf clapping at #datamining.,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`... 66

×