This document provides an overview and agenda for a Machine Data 101 presentation. The presentation covers Splunk fundamentals including the Splunk architecture and components, data sources both traditional and non-traditional, data enrichment techniques including tags, field aliases, calculated fields, event types, and lookups. Labs are included to help attendees get hands-on experience with indexing sample data, performing data discovery, and enriching data.
3. Agenda (modules)
Splunk Overview
Non-Traditional Data Sources
Data Enrichment (lab)
Level Up on Search & Reporting Commands (lab)
Visualizations commands (lab)
Data Models and Pivot
Advanced Visualizations and the Web Framework
3
9. Storage Optimization
Driving down data retention costs
Buckets
9
New Data Storage Controls
• 40-80% reduction in data footprint
• No functionality loss
• Limited performance tradeoff for typical use
cases
How does it work?
Certain Splunk performance optimization
data (TSIDX) is removed – yielding a
smaller footprint.
10. High Availability
• As Splunk software collects data, it keeps
multiple identical copies
• Data integrity and resilience without a SAN
• If indexer fails, incoming data continues
to get indexed
• Indexed data continues to be searchable
10
11. Visibility Across Datacenters (multi-site clustering)
New York Tokyo
London Cloud
• Distributed search unifies
the view across locations
• Role-based access controls how
far a given user's search will span
11
12. 1.
2.
3.
4.
Simple Steps to Deploy Splunk Enterprise
Download
Install
Forward Data
Search
Four steps:
DatabasesNetworks Servers Virtual
Machines
Smartphones
and Devices
Custom
Applications
Security Web
Server
Sensors
12
13. 1.
2.
3.
Simple Steps to Deploy Splunk Cloud
Sign Up
Forward Data
Search
Three steps:
DatabasesNetworks Servers Virtual
Machines
Smartphones
and Devices
Custom
Applications
Security Web
Server
Sensors
13
15. Traditional Data Sources
Captures events from log files in real time
Runs scripts to gather system metrics, connect
to APIs and databases
Listens to syslog and gathers Windows events
Universally indexes any data format so it
doesn’t need adapters
15
Windows
• Registry
• Event logs
• File system
• sysinternals
Linux/Unix
• Configurations
• Syslog
• File system
• Ps, iostat, top
Virtualization
• Hypervisor
• Guest OS
• Guest Apps
Applications
• Web logs
• Log4J, JMS, JMX
• .NET events
• Code and scripts
Databases
• Configurations
• Audit/query logs
• Tables
• Schemas
Network
• Configurations
• syslog
• SNMP
• netflow
18. zLinux Forwarder
18
Easily collect and index data on IBM mainframes
Collect application and platform data
Download as new Forwarder distribution for s390x Linux
Data inputs
19. The Splunk App for Stream
Wire Data Enhances the Platform for
Operational Intelligence
Efficient, Cloud-ready Wire Data Collection
Simple Deployment Supports Fast Time to Value
19
Data inputs
20. Stream PCACP For Better Insights
20
•Payload data including process
times, errors, transaction traces,
ICA latency, SQL statements, DNS
records…
•Analyze traffic volume, speed and
packets to identify infrastructure
performance issues, capacity
constraints, changes; establish
baselines
•Measure application response times,
deeper insights for root-cause
diagnostics, trace tx paths, establish
baselines
•Protocol conversations on database
performance, DNS lookups, client data,
business transaction paths…
•Customer Experience –
analyze website and
application bottlenecks to
improve customer experience
•Faster root cause analysis and
resolution of customer issues
with website or apps
•Protocol identification, protocol
headers, content and payload
information, flow record
•Build analytics and context for
incident response, threat detection,
monitoring and compliance
Security Digital
Intelligence
IT
Operations
Application
Management
Data inputs
21. Scripted Inputs
21
Send data to Splunk via a custom script
Splunk indexes anything written to STDOUT
Splunk handles scheduling
Supports shell, Python scripts, WIN batch, PowerShell
Any other utility that can format and stream data
Streaming Mode
Splunk executes script and indexes stdout
Checks for any running instances
Write to File Mode
Splunk launches script which produces
output file, no need for external scheduler
Splunk monitors output file
Data inputs
22. Scripted Inputs Use Cases
22
Alternative to file-base or network-based inputs
Stream data from command-line tools, such as vmstat and iostat
Poll a web service, API or database and process the results
Reformat complex or binary data for easier parsing into events and fields
Maintain data sources with slow or resource-intensive startup procedures
Provide special or complex handling for transient or unstable inputs
Scripts that manage passwords and credentials
Wrapper scripts for command line inputs that contain special characters
Data inputs
23. Modular Inputs
23
More control…
• Instance control: launch a single or
multiple instances
• Input validation
• Stream data as text or XML
• Support multiple platforms
• Secure access to mod input scripts via
REST endpoints
Data inputs
24. Modular Inputs Use Cases
24
Twitter
Stream JSON data from a Twitter source to Splunk using Tweepy
Amazon S3 Online Storage
Index data from the Amazon S3 online storage web service
Java Messaging Service (JMS)
Poll message queues and topics through JMS Messaging API
Talks to multiple providers: MQSeries (Websphere MQ), ActiveMQ,
TibcoEMS, HornetQ, RabbitMQ, Native JMS, WebLogic JMS, Sonic MQ
Splunk Windows Inputs
Retrieve WIN event logs, registry keys, perfmon counters
Data inputs
25. Database Inputs (DB Connect)
Create value with structured data
Enrich search results with additional
business context
Easily import data for deeper analysis
Integrate multiple DBs concurrently
Simple set-up, non-invasive and secure
DB Connect provides reliable, scalable,
real-time integration between Splunk and
traditional relational databases
Microsoft SQL
Server
JDBC
Database
Lookup
Database
Query
Connection
Pooling
Other
Databases
Oracle
Database
Java Bridge Server
25
Data inputs
26. Configure Database Inputs
26
DB Connect App
Real-time, scalable integration with relational DBs
Browse and navigate schemas and tables before data import
Reliable scheduled import
Seamless installation and UI configuration
Supports connection pooling and caching
“Tail” tables or import entire tables
Detect and import new/updated rows using timestamps or unique IDs
Supports many RDBMS flavors
IBM DB2, Oracle, Microsoft SQL Server, MySQL, SAP Sybase, PostgreSQL, +
Data inputs
27. Splunk ODBC Driver
27
• Interact with, manipulate and visualize machine data in
Splunk Enterprise using business software tools
• Leverage analytics from Splunk alongside Microsoft
Excel, Tableau Desktop or Micro strategy Analytics
Desktop
• Industry-standard connectivity to Splunk Enterprise
• Empowers business users with direct and secure access
to machine data
• Combine machine data with structured data for better
operational context
Data inputs
28. HTTP Event Collector (HEC)
Collect data over HTTP or HTTPS directly to Splunk
Application Developer focus – few lines of code in app
to send data
HEC Features Include:
Token-based, not credential based
Indexer Acknowledgements – guarantees data indexing
Raw and JSON formatted event payloads
SSL, CORS (Cross Origion access), and Network Restrictions
28
Data inputs
29. Configuring HEC
• Enable HTTP Event Collector
• Create/Get a token
• Send events to Splunk using the token
– Use HTTP Directly
Create a POST request and
set the Auth header with the token
POST JSON/RAW in our event format to the
collector
– Use logging libraries
Support for .NET, Java and JavaScript loggers
29
LA
B
30. 30
Using HEC in bash script
JSON Objects
{“event:{
“message”:”…”
”severity”:”warn”
“category”:”web”
}
}
Batching
{“event”:”event 1”}
{“event”:”event 2”}
{“event”:”event 3”}
Metadata:
{
“source”:”foo”,
“sourcetype”:”bar”,
“host”:”192.168.1.1”,
“time”:1441139779
“event”:”Hello World”
}
Index Selection:
{
“index”:”collector”
“event”:”Hellow World”
}
curl -k http://localhost:8088/services/collector/event -H "Authorization: Splunk FC24BA71-46AF-
40C9-A52D-779F7062D514" -d '{"event": "hello world"}'
Steps:
Settings > Data Inputs > HTTP Event Collector > Global Settings
LA
B
31. Extend Operational Intelligence to Mobile Apps
31
Deliver Better
Performing, More
Reliable Apps
Deliver Real-Time
Omni-Channel
Analytics
End-to-End
Performance and
Capacity Insights
32. Monitor App Usage and Performance
• Improve user retention by quickly
identifying crashes and
performance issues
• Establish whether issues are
caused by an app or the network(s)
• Correlate app, OS and device type
to diagnose crash and network
performance issues
32
33. Splunk Analytics for Hadoop
Explore Visualize Dashboard
s
ShareAnalyze
Hadoop Clusters NoSQL and Other Data Stores
Hadoop Client Libraries Streaming Resource Libraries
Splunk Analytics for Hadoop
Data stays in Hadoop
Automatically handles
MapReduce
Use SPL to search
Hadoop
Bi-directional
Integration
33
34. Connect to NoSQL and Other Data Stores
• Build custom streaming resource
libraries
• Search and analyze data from
other data stores in Hunk
• In partnership with leading
NoSQL vendors
• Use in conjunction with DB
Connect for relational database
lookups
Splunk Analytics for Hadoop
34
35. Integrated Hadoop Features
Access, analysis and storage flexibility with data lake
Amazon
EMR on S3
Hadoop
Clusters
Roll historical Splunk data into
existing Hadoop distribution
Seamlessly search your Hadoop
data within Splunk *
Enrich data in Hadoop with Splunk
search results
Import Hadoop data into Splunk
35
36. Integrates with Third-Party Business Tools
Analyst Splunk admin
Requirements
STEP 1 Business user
communicates data
requirements to
Splunk admin
STEP 2 Splunk admin authors saved
searches in Splunk Enterprise
thereby making the searches
available to ODBC driver
STEP 3 Business user
uses tool to access
saved searches and
retrieve data from
Splunk Enterprise ODBC driver
(SQL to SPL
translation layer)
Analyst
Saved
Searches
36
52. Adds inline meaning/context/specificity to raw data
Used to normalize metadata or raw data
Simplifies correlation of multiple data sources
Created in Splunk
Transferred from external sources
What is Data Enrichment?
52
LA
B
53. Add meaning/context/specificity to raw data
Labels describing team, category, platform, geography
Applied to field-value combination
Multiple tags can be applied for each field-value
Case sensitive
Tags
53
LA
B
55. Search events with tag in any field
Search events with tag in a specific field
Search events with tag using wildcards
Searching with Tags
(find the web servers)
55
tag=webserver
tag::host=webserver
tag=web*
Tag the host as
webserver
Tag the sourcetype
as web
1
2
3
4
5
LA
B
56. Normalize field labels to simplify search and correlation
Apply multiple aliases to a single field
Example: Username | cs_username | User user
Example: c_ip | client | client_ip clientip
Processed after field extractions + before lookups
Can apply to lookups
Aliases appear alongside original fields
Field Aliases
56
LA
B
57. How to Create Field Aliases
(re-label field to intuitive name)
57
1
2
3
LA
B
58. Create field alias of clientip = customer
Search events in last 15 minutes, find
customer field
Field alias (customer) and original field
(clientip) are both displayed
How To Search With Field Alias
58
1
3
2
sourcetype=access_combined
LA
B
59. Shortcut for performing
repetitive/long/complex transformations using
eval command
Based on extracted or discovered fields only
Do not apply to lookup or generated fields
Calculated Fields
59
LA
B
60. How to Create Calculated Field
(compute kilobytes from bytes)
60
1
21
2
3
LA
B
61. Create kilobytes = bytes/1024
Search events in last 15 minutes for
kilobytes and bytes
*use verbose mode if needed!
Search with Calculated Fields
(search using KB instead of bytes)
61
1
2
sourcetype=access_combined
LA
B
62. Classify and group common events
Capture and share knowledge
Based on search
Use in combination with fields and tags to define
event topography
Event Types
62
63. Best Practice: Use punct field
Default metadata field describing event structure
Built on interesting characters: ",;-#$%&+./:=?@'|*nr"(){}<>[]^! »
Can use wildcards
How to Create Event Types
63
event punct
####<Jun 3, 2014 5:38:22 PM MDT> <Notice>
<WebLogicServer> <bea03> <asiAdminServer>
<WrapperStartStopAppMain> <>WLS Kernel<> <>
<BEA-000360> <Server started in RUNNING mode>
####<_,__::__>_<>_<>_<>_<>_<>_
172.26.34.223 - - [01/Jul/2005:12:05:27 -0700]
"GET /trade/app?action=logout HTTP/1.1" 200 2953
..._-_-_[:::_-]_"_?=_/."__
64. Show punct for sourcetype=access_combined
Pick a punct, then wildcard it after the timestamp
Add NOT status=200
Save as “bad” event type + Color:red + Priority:1 (shift
reload in browser to show coloring)
Searching with Event Type
(classify events as known “bad”)
64
eventtype=bad
sourcetype="access_combined" punct="..._-_-_[//_:::]*" NOT status=200
1
2
3
4
LA
B
65. Augment raw events with additional fields
Provide context or supporting details
Translate field values to more descriptive data
Example: add text descriptions for error codes, IDs
Example: add contact details to user names or IDs
Example: add descriptions to HTTP status codes
File-based or scripted lookups
Lookups
65
LA
B
66. Lookups to Enrich Raw Data
LDAP
AD
Watch
Lists
CRM/
ERP
CMDB
External Data SourcesCreate additional fields
from the raw data with
a lookup to an external
data source
66
67. Simplest way to create a lookup table
67
sourcetype=access_combined | dedup status | table status | outputlookup file.csv
1
2
LA
B
69. Get the lookup from the Splunk Wiki (save to .csv file)
ftp://34.206.145.1/WorkShop_4_13_2017/MachineData101/ http_status.csv
Lookup table files -> Add new
Name: http_status.csv (must have .csv file extension)
Upload: <path to .csv>
Verify lookup was created successfully
Import HTTP Status Table
(step 1)
69
| inputlookup http_status.csv
1
2
3
LA
B
70. Lookup definitions > Add new
Name: http_status
Type: File-based
Lookup file: http_status.csv
Invoke the lookup manually
Add Lookup Definition
(step 2)
70
1
2
sourcetype=access_combined | lookup
http_status status OUTPUT status_description
LA
B
Filed extracted from source file Found in lookup table
71. Automatic lookups > Add new
Name: http_status (cannot have spaces)
Lookup table: http_status
Apply to: sourcetype = access_combined
Lookup input field: status
Lookup output field: status_description
Verify lookup is invoked automatically
Configure Automatic Lookup
(step 3)
71
1
2
sourcetype=access_combined
LA
B
72. 72
Lookup output fields
Provide one or more pairs of
output fields. The first field is
the corresponding field that
you want to output to
events. The second field is
the name that the output
field should have in your
events.
Lookup input fields
Provide one or more pairs of
input fields. The first field is
the field in the lookup table
that you want to match. The
second field is a field from
your events that should
match the lookup table field.
.
73. Temporal lookups for time-based lookups
• Example: Identify users on your network based on their IP address and the
timestamp in DHCP logs
Call an external command or script
• Python scripts only
• Example: DNS lookup for IP Host
Create a lookup table using a relational database
• Review matches against a database column or SQL query
Use search results to populate a lookup table
• sourcetype=access_combined | dedup status | table status | outputlookup file.csv
Other methods for creating Lookups
73
74. Creating and Managing Alerts (Job Inspector)
Macros
Workflow Actions
More Data Enrichment…
74
81. Tips for tuning searches
81
http://docs.splunk.com/Documentation/Splunk/latest/Search/Writebettersearches
Be more specific (as early as possible)
Avoid using NOT expressions when possible (inclusive search)
Restrict searches to the specific index
Use indexed and default fields (host, sourcetype, source)
Disable field discovery to improve search performance
Summarize your data (create summary indexers)
Use the Search Job Inspector
Book mark SPL link in browser
http://docs.splunk.com/Documentation/Splunk/6.5.3/SearchReference/Abstract
82. Workshop Notes for Presenter
Tip #5: In the next section, after each search, have the
participants save the search as a dashboard panel. At the end
of the workshop, they will have a living document of the
workshop exercises to reference later.
A complete version of this dashboard is packaged as an app.
It is uploaded to the Box folder as a leave behind.
82
83. Commands have parameters or qualifiers
top and rare have similar syntax
Each search command has its own syntax – show inline help
top, rare commands
(find most & least active customers)
83
... | top clientip limit=20
... | rare clientip limit=20
IPs with the
most visits
IPs with the
least visits
LA
BSave as dashboard
UI TIP:
Linux/Windows: Ctrl
Mac: Command
84. Sort inline descending or ascending
sort
(sort the Number of Customer Requests)
84
... | stats count by clientip | sort - count
... | stats count by clientip | sort + count
Number of requests by
customer - descending
Number of requests by
customer - ascending
LA
BSave as dashboard
85. http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commandsbycategory
Functions for eval + where Functions for stats + chart and timechart
functions + rename
(determine total customer payload)
85
... | stats sum(bytes) by clientip | sort - sum(bytes)
... | stats sum(bytes) as totalbytes by clientip | sort - totalbytes
Total payload by
customer - descending
Total payload by
customer - ascending
LA
B
Rename inline
Invoke a function
Save as dashboard
86. List all values of a field
list + values functions
86
... | stats values(action) by clientip
... | stats list(action) by clientip
Activity by customer
Distinct actions by
customer
LA
B
List only distinct values of a field
Save as dashboard
87. Show distinct actions and cardinality of each action
Combine list + values functions
87
sourcetype=access_combined
| stats count(action) as value by clientip, action
| eval pair=action + " [" + value + "]"
| stats list(pair) as action by clientip
LA
B
88. Add columns
addcoltotals command
88
sourcetype=access_combined | stats count by clientip, action
2 cols: clientip + action
sourcetype=access_combined
| stats sum(bytes) as total_bytes, avg(bytes) as avg_gbytes, count as total_events by clientip
| addcoltotals total_bytes, total_events
Sum total bytes and
total events columns
LA
B
Sum specific columns
Save as dashboard
89. Sum across rows
89
sourcetype=access_combined
| stats sum(bytes) as totalbytes, sum(other) as totalother by clientip
| addtotals fieldname=totalstuff
For each row, add
totalbytes + totalother
A better example:
physical memory + virtual memory =
total memory
addtotals commandLA
BSave as dashboard
90. Sew events together + creates duration + eventcount
transaction command
90
... | transaction JSESSIONID | table JSESSIONID, action, product_id
Group by
JSESSIONID
LA
BSave as dashboard
91. Intelligent group (creates cluster_count and cluster_label)
Cluster command
91
... | cluster showcount=1 | table _raw, cluster_count, cluster_label
LA
B
Sparklines inline in tables
92. Advanced Search Commands
Command Short Description Hints
transaction Group events by a common field value. Convenient, but resource intensive.
cluster Cluster similar events together. Can be used on _raw or field.
associate Identifies correlations between fields. Calculates entropy btn field values.
correlate Calculates the correlation between
different fields.
Evaluates relationship of all fields in
a result set.
contingency Builds a contingency table for two fields. Computes co-occurrence, or % two
fields exist in same events.
anomalies Computes an unexpectedness score for
an event.
Computes similarity of event (X) to a
set of previous events (P).
anomalousvalue Finds and summarizes irregular, or
uncommon, search results.
Considers frequency of occurrence
or number of stdev from the mean
92
95. Predict over time (5 mins)
Chart Overlay with and without streamstats (5 mins)
Maps with iplocation + geostats (5 mins)
Predict – sparklines (5 mins)
Single value (5 mins)
Metered visuals with gauge (5 mins)
LAB #4
Visualizations Commands(30 mins)
95
96. Predict future values using lower and upper bounds
predict command
96
... | timechart count as traffic | predict traffic
Predict website
traffic
LA
BSave as dashboard
97. Sparklines for trending of individual fields
Sparklines
97
... | stats sparkline(count) as trendline by clientip
In context of
larger event set
... | stats sparkline(count) as trendline sum(bytes) by clientip
Inline in tables
LA
BSave as dashboard
98. Simple overlay
Chart Overlays
98
sourcetype=access_combined (action=view OR action=purchase)
| timechart span=10m count(eval(action="view")) as Viewed,
count(eval(action="purchase")) as Purchased
Show browsing
vs. buying
LA
BSave as dashboard
99. Geolocation
99
... | iplocation clientip | geostats count by clientip
Combine IP lookup
with geo mapping
LA
B Save as dashboard
103. LAB #5
(instructor only)
What is a data model?
Build a data model
Pivot Interface
Accelerate a data model
103
104. Apps > Find More Apps >
Search: “Common Information Model”
Install free
Show fields for web + Web Data Model
Download CIM App
104
1
2
3
4
LA
B Show Splunkbase
Must have tag=web created
105. 105
Enables non-technical users to build complex
reports without the search language
Provides more meaningful representation of
underlying raw machine data
Acceleration technology delivers up to 1000x
faster analytics over Splunk 5
Pivot
Data
Model
Analytics
Store
Powerful Analytics Anyone Can Use
106. Define Relationships in Machine Data
Data Model
• Describes how underlying
machine data is represented
and accessed
• Defines meaningful
relationships in the data
• Enables single authoritative
view of underlying raw data
Hierarchical object view of underlying data
Add constraints to
filter out events
106
107. Transparent Acceleration
• Automatically collected
– Handles timing issues,
backfill…
• Automatically maintained
– Uses acceleration window
• Stored on the indexers
– Peer to the buckets
• Fault tolerant collection
Time window of data
that is accelerated
Check to enable
acceleration of
data model
High Performance
Analytics Store
107
108. Easy-to-Use Analytics
• Drag-and-drop interface
enables any user to analyze
data
• Create complex queries and
reports without learning
search language
• Click to visualize any chart
type; reports dynamically
update when fields change
Select fields from
data model
Time window
All chart types available in the chart toolbox
Save report
to share
Pivot
108
109. Defines least common denominator for a
data domain
Standard method to parse, categorize,
normalize data
Set of field names and tags by domain
Packaged as a Data Models in a Splunk App
Domains: security, web, inventory, JVM,
performance, network sessions, and more
Minimal setup to use Pivot interface
Common Information Model (CIM) App
109
110. Data Model & Pivot Tutorial
http://docs.splunk.com/Documentation/Splunk/latest/PivotTutorial/WelcometothePivotTutorial
110
112. Developer Platform
Web Framework Toolkit (WFT)
REST API and SDKs
Get a Flying Start
112
LAB #6
(instructor only)
113. The Splunk Enterprise Platform
Collection
Indexing
Search Processing Language
Core Functions
Inputs, Apps, Other Content
SDKContent
Core Engine
User and Developer Interfaces
Web Framework
REST API
113
114. Powerful Platform for Enterprise Developers
REST API
Web Framework
Web
Framework
Ruby
C#
PHP
Data Models
Search Extensibility
Modular Inputs
SDKsSimple XML
JavaScript
Django
Developers Can Customize and Extend
114
115. Splunk Software for Developers
Gain
Application
Intelligence
Build Splunk
Apps
Integrate and
Extend Splunk
115
116. Splunk ValueTransaction Profiling
Value
Transaction Profiling and Splunk
End users
Messaging
Legacy
Systems Databases
Web
Servers
App
Servers
Security
End usersEnd users
Java, .NET, PHP, etc.
116
Transaction path discovery
through entire stack
Pinpoint
bottlenecks/source of
transactions
End user experience
Detect slow code execution
for Java, .NET, PHP, node.js,
etc.
Single source for analyzing
and correlating logs and app
metrics
Pinpoint problems not
related to application
execution code
Application logs provide
developer insight to apps
Network-based insight to
transactions via Splunk
Stream
Customer/business context
for transactions
Virtualization
Servers
Storage
Networking/
Loadbalancing
117. Interactive, cut/paste examples from popular
source repositories: D3, GitHub, jQuery
Splunk 6.x Dashboard Examples App
https://apps.splunk.com/app/1603
Custom SimpleXML Extensions App
https://apps.splunk.com/app/1772
Splunk Web Framework Toolkit App
https://apps.splunk.com/app/1613
Example Advanced Visualizations
117
Download & install from Splunkbase
119. Add a D3 Bubble Chart
119
1. Go to Find More Apps and Install the
Splunk 6.x Dashboard Examples App
2. Enter the App
3. Go to Examples > Custom Visualizations >
D3 Bubble Chart
4. Copy autodiscover.js (file) + components/bubblechart (dir)
from: $SH/etc/apps/simple_xml_examples/appserver/static
to: $SH/apps/search/appserver/static
5. Copy and paste simple XML to new dashboard
LA
B
Background: This is a workshop designed to introduce new and experienced users to Splunk reports and dashboards. It was a little surprising, but not uncommon, to learn from some of our favorite Splunkers they didn’t know Splunk could create interactive, smart visuals like graphs/charts/reports and arrange them quickly on custom dashboards. This 30-45 minute workshop will catapult searchers into a whole new world of visualizations.
Industrial Data and devices present an opportunity for Splunk to introduce our platforms for operational intelligence to new customers and a new partner ecosystem. Industrial environments produce enormous volumes of data and are largely managed by solutions where getting fast and deep insights across data sources is difficult, if not impossible. Search and exploration across data sets will be a groundbreaking concept in these environments as it was when Splunk approached IT environments early on.
There are significant applications and technologies that are already designed and deployed to tackle particular data and process challenges in industrial environments that Splunk can complement. Asset Management, Automation and Control Systems, HMI applications, Manufacturing Execution Systems, Plant Historians, and ERP systems are all likely providers of both structured and unstructured data that can be analyzed and correlated in Splunk to provide new insights.
We remove part of the Splunk meta data that enables optimal Splunk search and reporting performance. We don’t reduce or eliminate any of the original data. Splunk technical specialists consult with the customer on a plan, and we use a TCO calculator to estimate the returns.
You can access the data in all of the normal ways, and for many search and reporting activities there is little impact. But for “needle in the haystack” real-time searches, the performance will be less optimal. The goal is to apply this feature to data that is less frequently accessed in real time – data for which you are willing to sacrifice some performance in order to gain a very significant cost savings. Splunk specialists can help you set the right policies for the right data.
The insights from your data are mission-critical.
Splunk indexers can now be grouped together to replicate each other’s data, maintaining multiple copies of all data – preventing data loss and delivering highly available data for Splunk search.
Using index replication, if one or more indexers fail, incoming data continues to get indexed and indexed data continues to be searchable.
By spreading data across multiple indexers, searches can read from many indexers in parallel, improving parallelism of operations and performance. All as you scale on commodity servers and storage. And without a SAN.
Searches can be distributed from a single search head to any number of indexers. These indexers can all be local for massive parallelization for Big Data problems, or spread across a global enterprise to help you keep data wherever makes the most sense for your network, availability, and security requirements.
Splunk Enterprise can be deployed on premise, in the cloud, or a combination of both.
There is also an Amazon Machine Image available or if you don’t want to host or administer Splunk, it can be managed as a service by our experts using “Splunk Cloud”.
It only takes minutes to download and install Splunk on the platform of your choice. Once Splunk has been downloaded and installed the next step is to forward data to the Splunk instance. At that point all data is searchable from a single place! Since Splunk stores a copy of the raw data, searches won’t affect the end devices. Having a central place to search your data not only simplifies things, it also decreases risk since a user doesn’t have to log into the end devices.
The software can be installed on a single small instance, such as a laptop, or installed on multiple servers to scale as needed. When installed on multiple servers the functions can be split up to meet any performance, security, or availability requirements.
If you're looking for all the benefits of Splunk Enterprise with all the benefits of software-as-a-service, then look no further. Splunk Cloud is backed by a 100% uptime SLA, scales to over 10TB/day, and offers a highly secure environment. It makes life easy so you can go home early. Steps to deploy are even simpler as all you need to do is signup, forward your data, and search!
Splunk Cloud delivers all the features of award-winning Splunk Enterprise, as a cloud-based service. The platform provides access to Splunk Enterprise Security and the Splunk App for AWS and enables centralized visibility across cloud, hybrid and on-premises environments.
Instant: Instant trial and instant conversion from POC to production.
Secure: Completed SOC2 Type 2 Attestation*. Dedicated cloud environments for each customer.
Reliable: 100% uptime SLA. All the features of Splunk Enterprise, including apps, APIs, SDKs. 10TB+/day scalability and up to 10x bursting over licensed data volumes**.
Hybrid: Centralized visibility across Splunk Cloud (SaaS) and Splunk Enterprise (software) deployments.
Splunk’s mission statement is to make machine data accessible, useful and valuable to everyone. Splunk can take any machine data and automatically index it for fast searching. Because Splunk doesn’t use a database, there are no additional licenses, and most importantly, no pre-defined schema to limit how you use your information.
Examples include the configuration files, syslog, Windows events and registry settings, as well as WMI. But the most important thing to note is how easy it is to get data into Splunk and make it useful.
The Splunk App for Stream software captures real-time wire data from distributed infrastructures, including private, public and hybrid clouds with on-the-fly deployment and fine-grained filtering capabilities.
Splunk DB Connect delivers reliable, scalable, real-time integration between Splunk Enterprise and traditional relational databases. With Splunk DB Connect, structured data from relational databases can be easily integrated into Splunk Enterprise, driving deeper levels of operational intelligence and richer business analytics across the organization.
Organizations can drive more meaningful insights for IT operations, security and business users. For example, IT operations teams can track performance, outage and usage by department, location and business entities. Security professionals can correlate machine data with critical assets and watch-lists for: incident investigations, real-time correlations and advanced threat detection using the award-winning Splunk Enterprise. Business users can analyze service levels and user experience by customer in real-time to make more informed decisions.
To address the needs of developers, operations and product management, you need operational intelligence for your mobile apps. This is what we call mobile intelligence. Mobile intelligence provides real-time insight on how your mobile apps are performing, and can correlate with and enhance operational intelligence.
Splunk software enables organizations to search, monitor, analyze and visualize machine-generated data from websites, applications, servers, networks, sensors and mobile devices. The Splunk MINT product line helps organizations monitor mobile app usage and performance, gain deep visibility into mobile app transactions and accelerate development
Deliver better performing, more reliable apps
When a user has a problem with a mobile app, the issue could be isolated or spread across all app versions, handsets and OS types. With Splunk MINT, you can see issues with app performance or availability in real time. Bugs can be addressed quickly, and app developers can gain a headstart in creating and delivering valuable app updates.
End-to-End Application Transaction Performance
When mobile apps fail, there are many potential sources of failure. With Splunk MINT Express, you can analyze overall transaction performance. And using Splunk MINT Enterprise, you can correlate this data with information from back-end apps to gain detailed insight on transaction problems. As a result, operations can reduce MTTR and better anticipate future mobile app back-end requirements.
Deliver real-time omnichannel analytics
Mobile apps give enterprises new ways of conducting digital business. With mobile app information in Splunk Enterprise, you can correlate usage and performance information—a form of omni-channel analytics—to better understand how users are engaging all aspects of your organization.
Splunk MINT Express provides a dashboard that offers and at a glance view of Mobile app health and usage. This includes an overall index called “MobDex”, which provides a blended view of Application usage, crashes, engagement in and abandonment. The insight boxes provide top-level aggregated information, which you can click on to get more specific information, and context.
Hunk offers Full-featured Analytics in an Integrated Platform
Explore, analyze and visualize data, create dashboards and share reports from one integrated platform.
Hunk enables everyone in your organization to unlock the business value of data locked in Hadoop
Hunk integrates the processes of data exploration, analysis and visualization into a single, fluid user experience designed to drive rapid insights from your big data in Hadoop. Enable powerful analytics for everyone with Splunk’s Data Models and the Pivot interface, first released in Splunk Enterprise 6.
And Hunk works with what you have today
Hunk works on Apache Hadoop and most major distributions, including those from Cloudera, Hortonworks, IBM, MapR and Pivotal, with support for both first-generation MapReduce and YARN (Yet Another Resource Negotiator, the technical acronym for 2nd generation MapReduce). Preview results and interactively search across one or more Hadoop clusters, including from different distribution vendors. Use the ODBC driver for saved searches with report acceleration to feed data from Hunk to third-party data visualization tools or business intelligence software. Streaming Resource Libraries enables developers to stream data from NoSQL and other data stores, such as Apache Accumulo, Apache Cassandra, Couchbase, MongoDB and Neo4j, for exploration, analysis and visualization in Hunk.
Splunk can easily integrate with your Hadoop data lake, contributing new data and providing a user friendly tool to gain analytical insights and value.
Data Roll: Easily archive your data in Splunk to Hadoop for audit or compliance needs using the Hadoop data roll feature.
Hadoop data roll is an option available to customers who would like to retain their historical Splunk data in their Hadoop data lake. This functionality used to be part of the Hunk product, but it now integrated within Splunk Enterprise and included with your license. It is compatible with most popular Apache Hadoop distributions as well as Amazon EMR running on S3 storage. The main benefit of Hadoop data roll is TCO reduction achieved by reducing the storage footprint and lower cost storage hardware. Additionally, your Hadoop applications will be able to use data that was originally indexed in Splunk.
Reduction in storage footprint is achieved by reducing Splunk search optimization data that are primarily used to speed up ”needle in the haystack” type searches. The storage footprint reductions can range from 40-80%, depending on the characteristics of the underlying data.
Unified Search:
You can combine that with your existing data in Hadoop to get full value from your data lake. Splunk Analytics for Hadoop add-on license also enables unified queries and dashboards across unstructured data in Splunk Enterprise and Hadoop, providing a single-pane-of-glass into real-time and historical data. Analyze and visualize months or years of data from a single, fluid user interface.
Finally, you can enrich your data in Hadoop with Splunk Search results and vice-versa … import your Hadoop data into Splunk to enrich and provide additional context for your Splunk searches.
These features provide you with the flexibility to store, access and analyze data from your Hadoop data lake the way you need to.
Splunk ODBC Driver lets you interact with, manipulate and visualize machine data stored in Splunk Enterprise using existing business software tools, such as Microsoft Excel or Tableau Desktop. This flexibility gives you the features available in Excel or Tableau Desktop as well as the advanced analytics capabilities of Splunk Enterprise.
Splunk Administrators need to create saved searches once. Business users then use a tool they are already familiar with to access those saved searches. Time savings and increased productivity are benefits everyone experiences.
The data for example may have a userid but you want to search on a name. Splunk’s lookup capability can enrich the raw data by adding additional fields at search time by. Some common use cases including event and error code description fields. Think “Page not Found” instead of “404”. Enriching your data can lead to entirely new insight.
In the example shown, Splunk took the userid and looked up the name and role of the user from an HR database. Similarly, it determined the location of the failed log in attempt by correlating the IP address. Even though these fields don’t exist in the raw data, Splunk allows you to search or pivot on them at any time.
You can also mask data. For example, you may want social security numbers to be replaced with all X’s for regular users but not masked for others. Removing data can also be useful, such as filtering PII, before writing it to an index in Splunk.
When running a search, the search head will fetch the events from disk which could be from multiple indexers, then it will sort and summarize the events, and format into the final results as requested before displaying it to the user. Splunk Enterprise search is incredibly powerful because it….
Data Models are created using the Data Model Builder and are usually designed and implemented by users who understand the format and semantics of their indexed data, and who are familiar with the Splunk Search Processing Language (SPL). They define meaningful relationships in the data.
Unlike data models in the traditional structured world, Splunk Data Models focus on machine data and data mashups between machine data and structured data. Splunk software is founded on the ability to flexibly search and analyze highly diverse machine data employing late-binding or search-time techniques for schematization (“schema-on-the-fly”). And Data Models are no exception. They define relationships in the underlying data, while leaving the raw machine data intact, and map these relationships at search time. They are therefore highly flexible and designed to enable users to rapidly iterate.
Security is also a key consideration and data models are fully permissionable in Splunk 6.
Data Models are accelerated using the High Performance Analytics Store, new in Splunk 6. The High Performance Analytics Store represents a breakthrough innovation from Splunk that dramatically accelerates analytical operations across massive data sets by up to 1000x over Splunk 5.
The Analytics Store contain a separate store of pre-extracted values derived from the underlying Splunk index. This data is organized in columns for rapid retrieval and powers dramatic improvements in the performance of analytical operations. Once created the Analytics Store is used seamlessly by Data Models and in turn the Pivot interface.
For users more comfortable with the Splunk Search Processing Language (SPL), The Analytics Store can also be used directly in the search language.
The Splunk Analytics Store is different from traditional Columnar databases – it is based on the Splunk lexicon and optimized for data retrieval (versus updates) by the Splunk Data Model or directly from the Splunk Search Processing Language.
With the Analytics Store, Splunk Enterprise now uniquely optimizes data retrieval for both rare term searches and now analytical operations all in the same software platform.
The new Pivot interface, combined with Data Models and the Analytics Store makes it dramatically easier for non-technical users and technical users alike to analyze and visualize data in Splunk and represent an important step towards Splunk’s mission of making machine data accessible, usable and valuable to everyone.
The Pivot interface enables non-technical and technical users alike to quickly generate sophisticated charts, visualizations and dashboards using simple drag and drop and without learning the Search Processing Language (SPL). Users can access different chart types from the Splunk toolbox to easily visualize their data different ways.
Queries using the Pivot interface are powered by underlying “data models” which define the relationships in Machine Data.
What does this platform look like?
The platform consists of 2 layer:
A core engine and an interface layer
On top of the platform you can’t run a broad spectrum of content that supports use cases
Use cases range from application mgmt. and IT operations, to ES and PCI compliance, to web analytics
The core engine provides the basic services for real time data input, indexing and search as well alerting, large scale distributed processing and role based access
The Interface layer consist of the basic UI for search, reporting and visualization – it contains developer interfaces, the REST API, SDKs and Web Framework
The SDKs provide a convenient access to core engine services in a variety of programing language environments.
The Web Framework enables developers to quickly create Splunk Apps by using the modern web programming paradigm including pre-built components, styles, templates,
and reusable samples as well as supporting the development of custom logic, interactions, components, and UI.
Developers can choose to program their Splunk App using Simple XML, JavaScript or Django (or any combination thereof).
These programmatic interfaces allow you to eithe:r:
extend Splunk
integrate Splunk with other applications
build completely new applications from scratch that require OI or analytical services that Splunk provides
BUILD SPLUNK APPS
The Splunk Web Framework makes building a Splunk app looks and feels like building any modern web application.
The Simple Dashboard Editor makes it easy to BUILD interactive dashboards and user workflows as well as add custom styling, behavior and visualizations. Simple XML is ideal for fast, lightweight app customization and building. Simple XML development requires minimal coding knowledge and is well-suited for Splunk power users in IT to get fast visualization and analytics from their machine data. Simple XML also lets the developer “escape” to HTML with one click to do more powerful customization and integration with JavaScript.
Developers looking for more advanced functionality and capabilities can build Splunk apps from the ground up using popular, standards-based web technologies: JavaScript and Django. The Splunk Web Framework lets developers quickly create Splunk apps by using prebuilt components, styles, templates, and reusable samples as well as supporting the development of custom logic, interactions, components, and UI. Developers can choose to program their Splunk app using Simple XML, JavaScript or Django (or any combination thereof).
EXTEND AND INTEGRATE SPLUNK
Splunk Enterprise is a robust, fully-integrated platform that enables developers to INTEGRATE data and functionality from Splunk software into applications across the organization using Software Development Kits (SDKs) for Java, JavaScript, C#, Python, PHP and Ruby. These SDKs make it easier to code to the open REST API that sits on top of the Splunk Engine. With almost 200 endpoints, the REST API lets developers do programmatically what any end user can do in the UI and more. The Splunk SDKs include documentation, code samples, resources and tools to make it faster and more efficient to program against the Splunk REST API using constructs and syntax familiar to developers experienced with Java, Python, JavaScript, PHP, Ruby and C#. Developers can easily manage HTTP access, authentication and namespaces in just a few lines of code.
Developers can use the Splunk SDKs to:
- Run real-time searches and retrieve Splunk data from line-of-business systems like Customer Service applications
- Integrate data and visualizations (charts, tables) from Splunk into BI tools and reporting dashboards
- Build mobile applications with real-time KPI dashboards and alerts powered by Splunk
- Log directly to Splunk from remote devices and applications via TCP, UDP and HTTP
- Build customer-facing dashboards in your applications powered by user-specific data in Splunk
- Manage a Splunk instance, including adding and removing users as well as creating data inputs from an application outside of Splunk
- Programmatically extract data from Splunk for long-term data warehousing
Developers can EXTEND the power of Splunk software with programmatic control over search commands, data sources and data enrichment.
Splunk Enterprise offers search extensibility through:
- Custom Search Commands - developers can add a custom search script (in Python) to Splunk to create own search commands. To build a search that runs recursively, developers need to make calls directly to the REST API
- Scripted Lookups: developers can programmatically script lookups via Python.
- Scripted Alerts: can trigger a shell script or batch file (we provide guidance for Python and PERL).
- Search Macros: make chunks of a search reuseable in multiple places, including saved and ad hoc searches.
Splunk also provides developers with other mechanisms to extend the power of the platform.
- Data Models: allow developers to abstract away the search language syntax, making Splunk queries (and thus, functionality) more manageable and portable/shareable.
- Modular Inputs: allow developers to extend Splunk to programmatically manage custom data input functionality via REST.
Splunk Enterprise empowers developers with application intelligence across the entire product development lifecycle, from monitoring code check-ins and build servers, to pinpointing production issues in real-time and gaining valuable insights on application usage and user preferences. Splunk Enterprise is a robust platform that enables developers to integrate data and functionality from Splunk software into applications across the organization using Software Development Kits (SDKs) for Java, JavaScript, C#, Python, PHP and Ruby. Developers can extend the power of Splunk software with programmatic control over search commands, data sources and data enrichment. Developers can use the tools and languages they know to build Splunk apps with custom dashboards, flexible UI and custom data visualizations, using the Splunk Web Framework.
In summary: BCI vendors like Dynatrace provide end-to-end visibility on application transactions and the app servers/environment where that application is run. If an application failure appears to be directly related to the app itself, Dynatrace helps developers and app managers pinpoint the problem. Splunk provides and end-to-end view on the entire stack of IT – from networking through web and app servers, the the “gear” (servers, storage, firewalls, Identity/Access mgmt. apps, etc.) that applications run on. Splunk also collects logs direct from the applications, so if developers write their apps to generate logs, Splunk consumes that as well. With Splunk + Dynatrace, application and developers get full visibility and ability to quickly drill down, across the entire environment.
More detail:
Combined, Splunk and Dynatrace provide a complete view of the application, both from a transaction perspective, and from a stack perspective. No single vendor offers this breadth of coverage and ability to analyze large amounts of metrics (via Dynatrace) and log information (via splunk).By having insight to both transaction metrics, infrastructure metrics, and log information, app managers and developers get a better understanding of what’s going on, and also why it’s happening. This approach covers both vendor’s “blind spots”. Dynatrace can measure end-to-end transaction and performance. If problems occur with the app itself (code execution), it can detect what part of the app is failing, and the stack trace that helps developers fix the problem. However, if the problem is with the infrastructure – networks, servers, web servers, databases, etc., Dynatrace may observe the bottleneck, but not necessarily know why the bottleneck exists. Splunk complements Dynatrace by collecting machine data (logs and metrics) across all of these elements and analyzing it to identify problems. By drilling down, you can find specific information that says what specifically went wrong.
Splunk provides a single place to bring together, analyze, and present data from across the stack, but unless every point in a transaction provides data back to Splunk (rarely the case when you include end users), you can’t see end-to-end transaction performance. Also, when problems occur with the application execution on the app server (bad code, java server fails, etc.), Splunk doesn’t know what failed (unless the developer thought about logging every sort of error). Dynatrace is able to follow transactions from end user, through the environment, and confirm transaction response times and completion. This provides an effective source of transaction/application discovery. It also has detailed instrumentation in Java, .NET, PHP and other app servers, so it can tell what code is performing slowly or failing, and what sequence it is being run. When Splunk’s analytics are pointing to the app itself as a point of failure, app experts can cross-launch right into Dynatrace to quickly get to the root cause of the application problem.
App managers and developers can dig in for transaction availability + app performance issues
App managers, developers, operations, security have a platform for business insights, info derived from logs, and identifying issues across the entire stack
Dynatrace or BCI for a nice map…of how the applications is layed out they like the transaction bottlenecks ..but 40% app code 40% is the infrastructure config (misconfig)…and 20% hw failures..Apm tools some of the time but not all of the time…and with the exception of Dynatrace…APPD and new relic…going into infrastructure..
“After this workshop, if you want more information, all the product documentation is available online. The documentation is divided into several manuals. For reporting and dashboards you will likely be most interested in the User and Developer Manuals.”
Splunk has an active community:
There is an emerging ecosystem of new companies building apps on top of Splunk. They are taking advantage of open APIs and new platform capabilities to create an entirely new generation of applications.
Splunk Answers is the go-to place for your questions – and answers. Our technical support is consistently rated as industry leading and Splunk Answers has answers to thousands of questions.
You can participate in meet-ups and User Groups, contribute to our forums, or attend local SplunkLive events (like this one) to hear from you peers.
“It is not possible to cover everything you need to know about building reports and dashboards in 30-45 minutes. For more structured training with labs, consider Splunk education courses. These are available as instructor-led web-based courses or onsite if there is enough participants per class.”