SplunkLive! Munich 2018: Data Onboarding Overview

Data Onboarding Overview
März 2018

During the course of this presentation, we may make forward-looking statements regarding future events or
the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could
differ materially. For important factors that may cause actual results to differ from those contained in our
forward-looking statements, please review our filings with the SEC.
The forward-looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or accurate
information. We do not assume any obligation to update any forward looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change
at any time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in
the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.
Safe Harbor Statement

© 2018 SPLUNK INC.
1. Splunk Data Collection Architecture
2. Apps and Technology Add-ons
3. Demos / Examples
4. Best Practices
5. Resources and Q&A
We Will Discuss:

Splunk Data
Collection
Architecture

Basic Architecture Refresh
How Splunk works at a high level
distributed search
auto-load balanced indexing
change tickets
web access logs
windows event logs / perfmon linux logs vmware logs, configs and metrics firewall data
app sever logs jmx and jvm metrics database logs and metrics product pricing
Search Head - Splunk’s UI
Indexer – Data Store/Processing
Forwarder - Collect & Send
Agentless

What can Splunk Ingest?
Agent-Less and Forwarder Approach for Flexibility and Optimization
syslog
TCP/UDP
Event Logs, Active Directory, OS Stats
Unix, Linux and Windows hosts
Universal Forwarder
syslog hosts
and network devices
Local File Monitoring
Universal Forwarder
Aggretation
host Windows
Aggregated/API Data Sources
Pre-filtering, API subscriptions
Heavy Forwarder
Mainframes*nix
Wire Data
Splunk Stream
Universal Forwarder or
HTTP Event Collector
DevOps, IoT,
Containers
HTTP Event Collector
(Agentless)
shell
API
perf

Collects Data From Remote Sources
• Splunk Universal Forwarders collect data from a local data source and sends it to
one or more Splunk indexers.
Scalable
• Thousands of universal forwarders can be installed with little impact on network
and host performance.
Broad Platform Support
• Available for installation on diverse computing platforms and architectures. Small
computing/disk/memory footprint.
Splunk Universal Forwarder
The Splunk Universal Forwarder is a Separate Download

Also Collects Data From Remote Sources...
• ...but is typically used for data aggregation for passage through firewalls, data
routing and/or filtering, scripted/modular inputs, or for HEC endpoints (more on this
in a bit).
Often run as a “data collection node” for API/scripted data access
• A heavy forwarder is typically run as a “data collection node” for technologies
requiring access via API, and not for collection of data from the node itself
Platform Support limited to that of Splunk Enterprise
• Being standalone, Heavy Forwarders are typically run on Linux VMs...
Splunk Heavy Forwarder
Configured via the regular Splunk Enterprise download

Large-Scale Data Collection Directly from Applications
• Provides a simple, load-balancer-friendly, secure way (token-based JSON or RAW
API) to send data at scale from applications directly to Splunk
Agentless
• Data at scale can be sent directly to indexer tier, bypassing forwarder layer
Broad Development Platform Support
• Logging drivers available for many platforms (docker, AWS Lambda, etc.) and
simple HTTP endpoint compatible with all development environments
Splunk HTTP Event Collector (HEC)
The Newest Way to Collect Data at Scale

App??? Add-on
▶ Your first choice when onboarding
new data
• Clean and ready to go out-of-the-box
▶ App is a complete solution
• Typically uses one or more TAs
▶ Add-on
• Abstracts collection methodology (log file, API,
scripted input, HEC)
• Typically includes relevant field extractions
(schema-on-the-fly)
• Includes relevant config files (props/transforms)
and ancillary scripts binaries

Where do you get Apps? Splunkbase!

Thriving Community
dev.splunk.com
75,000+ questions
and answers
1,000+ apps
Local User Groups &
SplunkLive! events

▶ Using the Data Previewer
• Upload a File (You did this in the Getting Started Hands-on Session!)
▶ Installing and using Apps and Add-ons
▶ Continuous Local File Monitoring (Universal Forwarder)
• Monitor a directory and multiple files in real-time
• Most common architecture for syslog-based sourcetypes
What You Will See

Data Onboarding
Best Practices

Components of a Splunk Success Program
Architecture
&
Infrastructure
Operations
& Supporting
Tools
Staffing
Data
On-
Boarding
User
On-Boarding
Inform

▶ Architect
• Design and optimize Splunk architecture for large-scale/distributed
deployments.
▶ System Administrator
• Implement and maintain Splunk infrastructure and configuration
▶ Search Expert
▶ App Developer
▶ Knowledge Manager
• Perform data interpretation, classification and enrichment
• Work with System Administrator to properly onboard data
Typical Splunk Staffing RolesArch &
Infra
Ops &
Tools
Staffing
Data
On-
Boarding
User
On-
Boarding
Inform

▶ Define on-boarding process for
new data sources / apps
▶ Repeatable, documented
process
▶ Provide customer interview
forum or survey
▶ Integrate with service workflow
Data Onboarding TasksArch &
Infra
Ops &
Tools
StaffingData
On-
Boarding
User
On-
Boarding
Inform
New Data Source Request
 Provide a data sample
 Describe the data’s structure
 timestamp | timezone  single-/multi-line
 sourcetype  interesting fields
 Describe initial uses for the data
 searches | alerts | reports | dashboards
 How to collect the data?
 UF | syslog | API
 How long to retain the data?
 Who should have access?
 Apply Common information Model
 Are there TA’s available?
 Validate

Ladies and Gentlemen, We’ll be Boarding Soon!
Six Things to Get Right at Index Time
Source
Event
Boundary /
LineBreaking
Host
Index
Sourcetype
Date
Timestamp

▶ Gather info (New Data Source Request):
• Where does this data originate/reside? How will Splunk collect it?
• Which users/groups will need access to this data? Access controls?
• Determine the indexing volume and data retention requirements
• Will this data need to drive existing dashboards (ES, PCI, etc.)?
• Who is the Owner/SME for this data?
▶ Map it out:
• Get a "big enough" sample of the event data
• Identify and map out fields (ensure CIM compliance)
• Assign sourcetype and TA names according to CIM conventions
Pre-Board Essentials

▶ Identify the specific sourcetype(s) - onboard each separately
• Important – syslog is not a sourcetype!
• More on this later
▶ Check for pre-existing app/add-on on splunk.com – don't
reinvent the wheel!
▶ Start with a “Test” index, Verify index-time settings correct
(previous slide)
• Try the Data Previewer first
• tweak props/transforms “by hand” only if absolutely necessary
Pre-Board Essentials (cont.)

▶ Find and fix index-time problems BEFORE
polluting your index
▶ A try-it-before-you-fry-it interface for figuring out
• Event breaking
• Timestamp recognition
• Timezone assignment
▶ Provides most necessary props.conf parameter settings
Your Friend, the Data Previewer

If you have to get into the weeds...
Always set these six parameters in props.conf
# SL17
[SL17]
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 19
SHOULD_LINEMERGE = False
LINE_BREAKER = ([nr]+)d{4}-d{2}-d{2}sd{2}:d{2}:d{2}
TRUNCATE = 10000

▶ The Common Information Model (CIM) defines relationships in
the underlying data, while leaving the raw machine data intact
▶ A naming convention for fields, eventtypes & tags
▶ More advanced reporting and correlation requires that the data
be normalized, categorized and parsed
▶ CIM-compliant data sources can drive CIM-based dashboards
(ES, PCI, others)
What Is the CIM and Why Should I Care?

▶ Syslog is a protocol – not a sourcetype
▶ Syslog typically carries multiple sourcetypes
▶ Best to pre-filter syslog traffic using syslog-ng or rsyslog
• Do not send syslog data directly to Splunk over a network port (514)
▶ Use a UF or HEC to transport data to Splunk (next slide)
• Ensures proper load balancing and data distribution
• Secure and efficient
• Insulates against Splunk component failures
▶ See https://www.splunk.com/blog/2017/03/30/syslog-ng-and-hec-scalable-
aggregated-data-collection-in-splunk.html for more info on this topic
A special note on Syslog

Recommended syslog architectures

▶ https://splunkbase.splunk.com/app/2962/
▶ For creating REST API, Scripted or Modular Inputs through a GUI
▶ Helps your Add-ons get Certified
▶ Can also use on sample data to build out configs as well
Check Out the New Add-on Builder!

▶ Videos!
• http://www.splunk.com/view/education-videos/SP-CAAAGB6
▶ Getting Data In – Splunk Docs
• http://docs.splunk.com/Documentation/Splunk/latest/Data/WhatSplunkcanmonitor
▶ Date and time format variables
• http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commontimeformatvariables
▶ Getting Data In – Dev Manual (very thorough!)
• http://dev.splunk.com/view/dev-guide/SP-CAAAE3A
▶ HTTP Event Collector
• http://docs.splunk.com/Documentation/Splunk/latest/Data/UsetheHTTPEventCollector
▶ .conf Sessions
• https://conf.splunk.com/session/2015/conf2015_Aduca_Splunk_Delpoying_OnboardingDataIntoSplunk.pdf
▶ GOOGLE!
Where to Go to Learn More

ORLANDO FLORIDA
Walt Disney World Swan and Dolphin Hotels
.conf18:
Monday, October 1 – Thursday, October 4
Splunk University:
Saturday, September 29 – Monday, October 1

Thank You!
Don't forget to rate this session on Pony Poll

SplunkLive! Munich 2018: Data Onboarding Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SplunkLive! Munich 2018: Data Onboarding Overview

Similar to SplunkLive! Munich 2018: Data Onboarding Overview (20)

More from Splunk

More from Splunk (20)

Recently uploaded

Recently uploaded (20)

SplunkLive! Munich 2018: Data Onboarding Overview

Editor's Notes