Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SplunkLive! Data Models 101

8,512 views

Published on

Published in: Technology
  • Be the first to comment

SplunkLive! Data Models 101

  1. 1. Copyright © 2013 Splunk Inc. Data Models 101
  2. 2. Legal Notices During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward-looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. ©2013 Splunk Inc. All rights reserved.
  3. 3. Search is hard.
  4. 4. Analytics Big Picture Pivot Build complex reports without the search language Data Model Provides more meaningful representation of underlying raw machine data Analytics Store Acceleration technology delivers up to 1000x faster analytics over Splunk 5 4
  5. 5. Operational Intelligence Across the Enterprise [10/11/12 18:57:04 000000b0 UTC] Raw Data IT professional Create and share data models Accelerate data models and custom searches with the analytics store Create reports with pivot Analytics Store Developer Leverage data models to abstract data Leverage pivot in custom apps Data Model Pivot Analyst Create reports using pivot based on data models created by IT
  6. 6. Pivot is a query builder.
  7. 7. Data Models 101
  8. 8. Source Data set Source Source
  9. 9. Success Sourcetype Failure Warning
  10. 10. Source Business division Source Data set Source Business division Source
  11. 11. Technology 1 Common model Technology 2 Technology 3
  12. 12. Context
  13. 13. Splunk Search Language search and filter | munge | report | clean-up sourcetype=access_combined source = "/home/ssorkin/banner_access.log.2013.6.gz" | eval unique=(uid + useragent) | stats dc(unique) by os_name | rename dc(unique) as "Unique Visitors" os_name as "Operating System"
  14. 14. Hurdles index=main source=*/banner_access* uri_path=/js/*/*/login/* guid=* useragent!=*KTXN* useragent!=*GomezAgent* clientip!=206.80.3.67 clientip!=198.144.207.62 clientip!=97.65.63.66 clientip!=175.45.37.78 clientip!=209.119.210.194 clientip!=212.36.37.138 clientip!=204.156.84.0/24 clientip!=216.221.226.0/24 clientip!=207.87.200.162 | rex field=uri_path "/js/(?<t>[^/]*)/(?<v>[^/]*)/login/(?<l>[^/]*)” | eval license = case(l LIKE "prod%" AND t="pro", "enterprise", l LIKE "trial%" AND t="pro", "trial", t="free", "free”) | rex field=v "^(?<vers>d.d)” | bin span=1d _time as day | stats values(vers) as vers min(day) as min_day min(eval(if(vers=="5.0", _time, null()))) as min_day_50 dc(day) as days values(license) as license by guid | eval type = if(match(vers,"4.*"), "upgrade", "not upgrade") + "/" + if(days > 1, "repeat", "not repeat")| search license=enterprise | eval _time = min_day_50| timechart count by type| streamstats sum(*) as * • Simple searches easy… Multi-stage munging/reporting is hard! • Need to understand data’s structure to construct search • Non-technical users may not have data source domain knowledge • Splunk admins do not have end-user search context
  15. 15. Data Model Goals • Make it easy to share/reuse domain knowledge • Admins/power users build data models • Non-technical users interact with data via pivot UI
  16. 16. Data Models 101
  17. 17. What is a Data Model? A data model is a search-time mapping of data onto a hierarchical structure Encapsulate the knowledge needed to build a search Pivot reports are build on top of data models Data-independent Screenshot here
  18. 18. A Data Model is a Collection of Objects Screenshot here
  19. 19. Objects Have Constraints and Attributes Screenshot here
  20. 20. Child Objects Inherit Constraints and Attributes Screenshot here
  21. 21. Child Objects Inherit Constraints and Attributes
  22. 22. Building Data Models
  23. 23. Three Root Object Types Event – MapstoSplunkevents – Requiresconstraints andattributes
  24. 24. Three Root Object Types Event – MapstoSplunkevents – Requiresconstraints andattributes Search – MapstoarbitrarySplunksearch(may includegenerating,transformingand reportingsearchcommands) – Requiressearchstringattributes • Transaction – Mapsto groupsof Splunkeventsor groupsof Splunksearchresults – Requiresobjectsto group,fields/ conditionstogroupby,andattributes
  25. 25. Three Root Object Types Event – MapstoSplunkevents – Requiresconstraints andattributes Search – MapstoarbitrarySplunksearch(may includegenerating,transformingand reportingsearchcommands) Requiressearchstringattributes Transaction – Mapsto groupsof Splunkeventsor groupsof Splunksearchresults – Requiresobjectsto group,fields/ conditionstogroupby,andattributes
  26. 26. Object Attributes Auto-extracted – default and predefined fields Eval expression – a new field based on an expression that you define Lookup – leverage an existing lookup table Regular expression – extract a new field based on regex Geo IP – add geolocation fields such as latitude, longitude, country, etc.
  27. 27. Object Attributes Set field types Configure various flags Note: Child object configuration can differ from parent
  28. 28. Best Practices Use event objects as often as possible – Benefit from data model acceleration Resist the urge to use search objects instead of event objects!! – Event based searches can be optimized better Minimize object hierarchy depth when possible – Constraint based filtering is less efficient deeper down the tree Event object with deepest tree (and most matching results) first – Model-wide acceleration only for first event object and its descendants
  29. 29. Warnings! Object constraints and attributes cannot contain pipes or subsearches A transaction object requires at least one event or search object in the data model Lookups used in attributes must be globally visible (or at least visible to the app using the data model) No versioning on data models (and objects)!
  30. 30. From Data Models to Reports
  31. 31. Using the UI Subhead Count of http_success events, split by useragent events fields
  32. 32. Under the Hood: Object Search String Generation Event Object Syntax: <constraints search> | <my attribute definitions> Example: sourcetype=access_* OR sourcetype=iis* uri=* uri_path=* status=* clientip=* referer=* useragent=*
  33. 33. Under the Hood: Object Search String Generation Search Object Syntax: <base search> | <my attribute definitions> Example: _time=* host=* source=* sourcetype=* uri=* status<600 clientip=* referer=* useragent=* (sourcetype=access_* OR source=*.log) | eval userid=clientip | stats first(_time) as earliest, last(_time) as latest, list(uri_path) as uri_list by userid | earliest=* latest=* uri_list=*
  34. 34. Under the Hood: Object Search String Generation Transaction Object Syntax: <objects to group search> | transaction <group by fields> <group by params> | <my attribute definitions> Example: sourcetype=access_* uri=* uri_path=* status=* clientip=* referer=* useragent=* | transaction clientip useragent | eval landingpage=mvindex(uri_path,1) | eval exitpage=mvindex(uri_path,-1)
  35. 35. Under the Hood: Object Search String Generation Child Object Syntax: <parent object search> | search <my constraints> | <my attribute definitions> Example: sourcetype=access_* uri=* uri_path=* status=* clientip=* referer=* useragent=* status=2* | <my attribute definitions>
  36. 36. Using the Splunk Search Language Object Search String | datamodel <modelname> <objectID> search Example: | datamodel WebIntelligence HTTP_Request search Behind the scenes: sourcetype=access_* OR sourcetype=iis* uri=* uri_path=* status=* clientip=* referer=* useragent=*
  37. 37. Under the hood: Pivot Search String Generation Pivot search = object search + filters + reporting + formatting Example: (sourcetype=access_* OR sourcetype=iis*) status=2* uri=* uri_path=* status=* clientip=* referer=* useragent=* | stats count AS "Count of HTTP_Sucess" by ”useragent" | sort limit=0 "useragent" | fields - _span | fields "useragent" "Count of HTTP_Success" | fillnull "Count of HTTP_Success" | fields "useragent" *
  38. 38. Using the Splunk Search Language Pivot Search String | pivot <modelname> <objectID> [statsfns, rowsplit, colsplit, filters, …] Example: | pivot WebIntelligence HTTP_Request count(HTTP_Request) AS "Count of HTTP_Request" SPLITROW status AS "status" SORT 0 status Behind the scenes: sourcetype=access_* OR sourcetype=iis* uri=* uri_path=* status=* clientip=* referer=* useragent=* | stats count AS "Count of HTTP_Request" by "status" | sort limit=0 "status" | fields - _span | fields "status", "Count of HTTP_Request" | fillnull "Count of HTTP_Request" | fields "status" *
  39. 39. Warnings • | datamodel and | pivot are generating commands – They must be at the beginning of the search string • Use objectIDs NOT user-visible object names
  40. 40. Managing Data Models
  41. 41. Data Model on Disk Each data model is a separate JSON file Lives in <myapp>/local/data/models (or <myapp>/default/data/models for pre-installed models) Has associated conf stanzas and metadata
  42. 42. Editing Data Model JSON At your own risk! Models edited via the UI are validated Manually edited data models: NOT SUPPORTED Exception: installing a new model by adding the file to <myapp>/<local OR default>/data/models is probably okay
  43. 43. Deleting a Data Model Use the UI for appropriate cleanup Potential for bad state if manually deleting model on disk
  44. 44. Interacting With a Data Model Use data model builder and pivot UI – safest option! Use REST API – for developers (see docs for details) Use | datamodel and | pivot Splunk search commands
  45. 45. Permissions Data models have permissions just like other Splunk objects Edit permissions through the UI
  46. 46. Data Model Acceleration Admin or power user Backend magic Acceleration Non-technical user Run search using on-disk acceleration Run a pivot report No acceleration Kick off ad-hoc acceleration and run search
  47. 47. Model-Wide Acceleration Only accelerates first eventbased object and descendants Does not accelerate search and transaction-based objects Pivot search: | tstats count AS "Count of HTTP_Success" from datamodel="WebIntelligence" where (nodename="HTTP_Request") (nodename="HTTP_Request.HTTP_Success") prestats=true | stats count AS "Count of HTTP_Success”
  48. 48. Ad-Hoc Object Acceleration Kick off acceleration on pivot page (re) load for non-accelerated models and search/transaction objects Amortize cost of ad-hoc acceleration over repeated pivoting on same object Pivot search: | tstats count AS "Count of HTTP_Success" from sid=1379116434.663 prestats=true | stats count AS "Count of HTTP_Success”
  49. 49. Acceleration Disclaimers Works with search-head pooling – we collect on indexers Cannot edit accelerated models
  50. 50. Thank You

×