Sumo Logic's first How-To webinar focused on optimizing our users' search experience. The webinar covers the following:
- Developing good search habits
- Setting the proper expectations around search performance
- The factors related to search speed
- Creating field extraction rules
- Defining a partitioning strategy
- Configuring scheduled views
4. Sumo Logic Confidential
Search Structure
Keywords and operators (separated by pipes) that build on top of each other
Syntax:
metadata tags + keywords | parse | filter | aggregate | sort | limit
Example Search:
Results
where
metadata
keyword
5. Sumo Logic Confidential
Metadata Fields
Each log message is tagged with these metadata fields
Metadata fields are established during Collector and Source configuration
Blog (August 27th): Good Source Category, Bad Source Category
6. Sumo Logic Confidential
Keyword Search
Case Insensitive
Wildcard Support (e.g. ERR*)
Boolean Logic Support
AND
OR
!(A OR B)
Combine these keywords with metadata fields
Bloom filters
Using keywords helps bloom filters locate data very quickly
7. Sumo Logic Confidential
Processing Your Search Request
Initiate
• Queries are rewritten automagically
• The Sumo Logic service calls backend clusters to kickoff the request
Reduce
• Sumo locates indices that contain data for search time-range
• Bloom filters further eliminate indices where keywords are not contained
Data
Retrieval
• Everything through the first pipe is retrieved
• Data is carried forward
Parallelize
• Remaining operations are conducted
• If aggregation is involved, we look for opportunities to parallelize the operation
9. Sumo Logic Confidential
Develop Good Search Habits
Use metadata and keyword combinations to reduce scope
Add line breaks after each operation
Limit result sets before aggregating data à user=a | count by user
Use parse anchor instead of parse regex for structured messages
Avoid the use of expensive parse regex tokens like .* à d{2,10}
Narrow your time-range down as much as possible
10. Sumo Logic Confidential
The Time Range Effect
More recent data can be accessed quickly
We do something special for searching the last 24 hours of events
Over 90% of queries are executed on recent data
Test queries on very recent data first before saving
11. Sumo Logic Confidential
Review Your Data Source Time Zone Settings
Leads to a large gap between message time and receipt time
Causes data fragmentation and can affect search speed
Support of Java 6 Time Zone formats
Pacific Standard Time; PST; GMT-08:00
-0800
NOT US/Pacific
Data integrity will be questioned by users
12. Sumo Logic Confidential
Compute-Intensive Operations
Multiple .* tokens in a single parse regex statement
Parse using public library (apache/access, iis, windows/2008)
Summarize (LogReduce)
Join
Data retun time exponentially increases when extending your time-range
Transaction
Limit the timewindow parameter for finding corresponding events
isNull()
Outlier
15. Sumo Logic Confidential
How Data is Optimized for Search
Data is Ingested
Field Extraction
rules parses
fields
Partitions route
messages to
separate tables
Scheduled
Views pre-
aggregate data
Data is Indexed
Ad-Hoc Queries
Dashboards
Scheduled Searches
Alerts
Optimized data can be
used with other features
17. Sumo Logic Confidential
Benefits of Field Extraction Rules
Extract fields at the time of ingest to be leveraged across the
product
Standardize Searches and Field Names
Simplify searches
Utilize keyword search instead of ‘where’ (status_code=404)
Improves Search Performance
Eliminates the need to write ‘parse’ data during query run time
18. Sumo Logic Confidential
When to Use Field Extraction Rules
The same (or very similar) parse statement is being used
over and over
Filtering data based off of parsed fields
Logs need to be tied together based on a Unique ID
Session ID
User Name
Process ID
Parsing over a large volume of data
19. Sumo Logic Confidential
Create Field Extraction Rule – With Parse Expression
Use Scope to
define what data
this FER applies to
Use Regex to
create your parse
expression
Templates exist for
common sources
20. Sumo Logic Confidential
Field Extraction Rules - Querying
With FERs, parsed
fields are available
to use in your
keyword search
Parsed fields are
available in your
Field Browser for
further analysis
21. Sumo Logic Confidential
Field Extraction Rule Recommendations
Test the rule by running a search over a small time-range
The scope and parse statement should not change
Ensure your Field Extraction Rule covers common
searches
Only extract the minimum fields necessary
Use ‘fields’ operator to limit results
22. Sumo Logic Confidential
Limitations to Be Aware Of
Max of 50 Rules
Max of 200 Total Fields
Supported Operators
Parse Anchor
Parse Regex
Parse nodrop
Double
Fields
Keyvalue
Num
NOTE: Deleted rules and fields defined in
them will still count towards the max
24. Sumo Logic Confidential
Benefits of Partitions
Divides your data into smaller chunks to be searched on
Similar data is grouped together
Improves performance when used in searches
It can eliminate the need for lengthy scope definitions
Takes advantage of your source category taxonomy
25. Sumo Logic Confidential
When to Create Partitions
Sets of data are being searched in isolation
A large amount of data being sent daily (> 5 GB’s)
Navigate to Manage à Account if you don’t know
Different groups are focused on specific areas of
your technology stack
RBAC filtering is required for data provisioning
26. Sumo Logic Confidential
Partitioning Recommendations
Use _sourceCategory or other metadata tags for the routing expression
à _sourceCategory=QA/Web/Apache/Access
à _sourceCategory=*apache*
à _sourceCategory=*web*
à _sourceCategory=qa/*
Avoid using a partition name and routing expression that are subject to change
Group Data that is used together (web logs, OS data, QA environment)
Do not create overlapping partitions (e.g. QA* and QA/Web*)
Ideally, use less than 20 total partitions in your environment
No single partition should exceed 30% of your total daily volume
27. Sumo Logic Confidential
Use Data Volume Index
Helps to determine possible ways to partition data
Recommended partition size à Up to 30% of data volume
Manage à Account à Data Management
Library à Apps à Data Volume
30. Sumo Logic Confidential
Benefits of Scheduled Views
Similar to relational DB materialized views
Pre-aggregate data
Significantly improve performance for high selectivity queries
(_source=A or _source=B) and _sourceName=C and keyword1 and keyword2
Results are updated every minute
Allows for long term trending analysis
Data can be backfilled
31. Sumo Logic Confidential
When to Use Scheduled Views
Specific aggregate operators are used heavily in queries
Count
Sum
Data is being trended over a long period of time (e.g. Last 30 Days)
Failed logins on critical servers
Number of 404 errors
A highly selective query does not perform well
32. Sumo Logic Confidential
Scheduled Views Recommendations
Use 1 minute timeslices
Include aggregation
Use queries that are not likely to change
Take advantage of existing partitions and FER’s
Include only necessary fields
Only backfill data needed for analyses
33. Sumo Logic Confidential
Scheduled Views Limitations
Data added to scheduled views are counted towards your contracted data
volume quota
Parsed fields in views count towards field extraction limitation (200)
Data will only be backfilled through your plan’s retention period
Supported aggregate operators
Difference
Num
Count
Sum
35. Sumo Logic Confidential
Review: Factors in Search Performance
Query structure
Data Selectivity (keywords, metadata fields)
Heavy Operations (join, transaction, summarize)
Search Time Range
Possible Time Zone Misconfiguration at Source Level
Overall Data Volume for Account
Use of Performance Optimization Tools
Service Anomalies
36. Sumo Logic Confidential
Review: Search Optimization Tools
What I want to do is Partition Scheduled View Field Extraction
Run queries against a
certain set of data
Choose if the
amount of data is
between 1-30%
Choose if the
amount of data you’d
like to segregate is
1% or less
Choose if you want to
pre-extract fields that
you are searching
against frequently
Parse the same type of log
message repeatedly
✔
Use data to identify long-
term trends
✔
Segregate data by
Metadata
✔
Pre-computed or
aggregate data ready to
query
✔
Use RBAC to deny or grant
access to the data
✔ ✔