More Related Content
Similar to Reactive to Proactive: Intelligent Troubleshooting and Monitoring with Splunk (20)
Reactive to Proactive: Intelligent Troubleshooting and Monitoring with Splunk
- 1. © 2017 SPLUNK INC.© 2017 SPLUNK INC.
Reactive to Proactive:
Intelligent Troubleshooting and Monitoring with Splunk
- 2. © 2017 SPLUNK INC.
Session Agenda
• Splunk for IT Operations – Introduction
• IT Ops Hands On
• IT Ops Relevant Splunk Apps
• Introducing Splunk IT Service Intelligence
• Customer Stories
• Wrap Up
- 4. © 2017 SPLUNK INC.
Escalating IT Complexity…
SaaS/PaaS
IaaS
VIRTUALIZATION
STORAGE
PACKAGED
APPLICATIONS
CUSTOM
APPLICATIONS
HR
Email
Finance
App Svr
DB
Web Svr
INFRASTRUCTURE
APPLICATIONS
VPN
IP Phone
Identify
SERVERS NETWORKING
- 5. © 2017 SPLUNK INC.
… Plaguing IT Operations
SaaS/PaaS
IaaS
VIRTUALIZATION
STORAGE
PACKAGED
APPLICATIONS
CUSTOM
APPLICATIONS
HR
Email
Finance
App Svr
DB
Web Svr
INFRASTRUCTURE
APPLICATIONS
VPN
IP Phone
Identify
SERVERS NETWORKING
Complex, silo-based technologies
Disconnected and outdated point solutions
Reactive brute-force problem resolution
Over 80% of time spent on
maintaining, not innovating
- 6. © 2017 SPLUNK INC.
Industry-Leading Platform for Machine Data
Custom
dashboards
Report and
analyze
Monitor
and alert
Developer
Platform
Ad hoc
search
On-Premises
Private Cloud
Public
Cloud
Storage
Online
Shopping Cart
Telecoms
Desktops
Security
Web
Services
Networks
Containers
Web
Clickstreams
RFID
Smartphones
and Devices
Servers
Messaging
GPS
Location
Packaged
Applications
Custom
Applications
Online
Services
DatabasesCall Detail
Records
Energy MetersFirewall
Intrusion
Prevention
Platform Support (Apps / API / SDKs)
Enterprise Scalability
Universal Indexing
Machine Data: Any Location, Type, Volume Answer Any Question
- 7. © 2017 SPLUNK INC.
Industry-Leading Platform for Machine Data
Custom
dashboards
Report and
analyze
Monitor
and alert
Developer
Platform
Ad hoc
search
On-Premises
Private Cloud
Public
Cloud
Storage
Online
Shopping Cart
Telecoms
Desktops
Security
Web
Services
Networks
Containers
Web
Clickstreams
RFID
Smartphones
and Devices
Servers
Messaging
GPS
Location
Packaged
Applications
Custom
Applications
Online
Services
DatabasesCall Detail
Records
Energy MetersFirewall
Intrusion
Prevention
Platform Support (Apps / API / SDKs)
Enterprise Scalability
Universal Indexing
Machine Data: Any Location, Type, Volume Answer Any Question
Any Amount, Any Location, Any Source
Schema
on-the-fly
Universal
indexing
No
back-end
RDBMS
No need
to filter
data
- 8. © 2017 SPLUNK INC.
The Focus
Developer Platform (REST API, SDKs)
IT
Operations
Application
Delivery
Business
Analytics
Internet of
Things and
Industrial
Data
Security,
Compliance
and Fraud
Platform for Operational Intelligence
- 9. © 2017 SPLUNK INC.
Turning Machine Data Into Operational Intelligence
Search
and
Investigate
Proactive
Monitoring
and Alerting
Operational
Visibility
Real-Time
Business
Insight
Reactive
Proactive
- 10. © 2017 SPLUNK INC.
Troubleshooting
Find and fix problems faster
Reduce
MTTR
Improve End
User Experience
Reduce
Costs
Greater IT
Productivity
- 11. © 2017 SPLUNK INC.
Troubleshooting
Find and fix problems faster
Reduced
MTTR
Reduce
MTTR
Improve End
User Experience
Reduce
Costs
Greater IT
Productivity
No more grepping through logs
End-to-end correlation
- 12. © 2017 SPLUNK INC.
Increased Uptime
Trends in Real Time
and Historical Data
Powerful
Visualizations
Alerting and
Notifications
Monitoring
Find and fix problem before it becomes a problem
- 14. © 2017 SPLUNK INC.
Index and Analyze Data Across Your Technology Stack
Splunk Add-Ons, Templates and Apps Accelerate Value From Machine Data
No rigid schemas – add in data from any other source.
Server, Storage,
Network
Virtualization,
Containers
Operating Systems
and Databases
Custom
Applications
Business
Applications
Cloud Services
Web Intelligence
Mobile
Applications
Stream
Operations and
Service Desks
App Performance
Monitoring
DB Connect
API
- 15. © 2017 SPLUNK INC.
Apps Provide Deep Insights By Role
Find and resolve problems fast in individual technology areas
Exchange
Admin
Service Health
Performance
Message Tracking
VMware/Win/Linux
Admin
Infrastructure Health
Performance
Anomalies/Outliers
Storage
Admin
Infrastructure Health
Performance
Anomalies/Outliers
- 16. © 2017 SPLUNK INC.
Fast-Track Your Deployment With Splunk Quick Start
FAST time-to-results EASY to deploy LOW PRICE starting at $30K
Splunk
Quick
Start
Continued
Success
Education
credits and
.conf passes
Deploy in
1 Week
Expert support
+
customer success
manager
Tailored
Splunk Apps &
Add-Ons curated
for your specific
use case
Scalable
Scales from
20GB/day to
100GB/day
+
Easy path to
upgrades
Complete
Everything you
need to get
started
- 17. © 2017 SPLUNK INC.
Splunk Quick Start
A quick and easy way to deploy Splunk Enterprise at a low price
Splunk Education
Credits and
.conf Passes
Everything you
need to get your
team Splunk
Certified
Tailored Selection
of Splunk Apps
and Add-ons
Index and
visualize the
data sources
you need
Personalized
Support
Customer
Success
Manager to help
you get up and
running in 1
week
Splunk
Enterprise
License
Discounted
by volume
- 20. © 2017 SPLUNK INC.
Troubleshooting With Splunk
LOGIN DETAILS
URLs:
Username:
Password:
- 21. © 2017 SPLUNK INC.
Login to Splunk
Click on “Search and Reporting” to get started using Splunk!
- 22. © 2017 SPLUNK INC.
▶ Over 140 search commands
▶ Syntax was originally based upon the Unix pipeline
and SQL and is optimized for time-series data
▶ The scope of SPL includes data searching, filtering, modification, manipulation,
enrichment, insertion and deletion
▶ Includes machine learning such as anomaly detection
SPL Overview
Disk
Intermediate
results table
Intermediate
results table
Final
results table
- 23. © 2017 SPLUNK INC.
Why Create a New Query Language?
Flexibility and effectiveness on
small and big data
Late-binding schema
More/better methods of correlation
Not just analyze, but visualize Data
BIG Data
- 24. © 2017 SPLUNK INC.
search and filter | munge | report | cleanup
| rename sum(KB) AS "Total KB" dc(clientip) AS "Unique Customers"
| eval KB=bytes/1024
sourcetype=access*
| stats sum(KB) dc(clientip)
SPL Basic Structure
- 25. © 2017 SPLUNK INC.
Searching With Splunk
Start by typing * in
the search bar!
- 26. © 2017 SPLUNK INC.
Search Results
Explore the results!
Host = server
Sourcetype = data format
Look at the other fields
Next, let’s extract new
fields!
Search for:
sourcetype=apache:acce
ss then click “Extract
New Fields” at the
bottom of the field list.
- 27. © 2017 SPLUNK INC.
Extracting Fields
Choose any event from the list to start.
Note that there’s one field that is not already highlighted
On the next screen, choose “Regular Expression” (but don’t panic – we won’t be
writing regexes)
- 28. © 2017 SPLUNK INC.
Extracting Fields, cont.
Highlight the new field by selecting
the text.
In the pop-up, name the field “size”
and click “Add Extraction”
Check the Preview that comes up
to see the new field!
- 29. © 2017 SPLUNK INC.
Use the New Field!
Search for
sourcetype=apache:a
ccess again and you’ll
see the new field!
Let’s get the
maximum size for the
last hour!
Add “| stats
max(size)” to the
search (without
quotes)
- 30. © 2017 SPLUNK INC.
Troubleshooting Infrastructure
We have reports of problems with the database – search sourcetype=mysqld
Which machine do you think we should investigate further?
- 31. © 2017 SPLUNK INC.
Troubleshooting Infrastructure, cont.
Search for
sourcetype=df on the
affected host.
Click the
“PercentUsedSpace”
field and then click
“Maximum value over
time”
- 32. © 2017 SPLUNK INC.
Troubleshooting Infrastructure, cont.
Now we can see that this server has a full disk!
- 33. © 2017 SPLUNK INC.
Troubleshooting Applications
Start by searching for
“sourcetype=mint:network”
Splunk MINT enables you to
get data from mobile
applications.
Narrow down to see just
the non-200 status codes.
- 34. © 2017 SPLUNK INC.
Troubleshooting Applications, cont.
There are many potential variables when dealing with mobile applications.
Check to see if the problem is with a single device, carrier, platform, or version
(appVersionName)
- 35. © 2017 SPLUNK INC.
Creating an Alert
We’ve found the problem – a bad application version
that impacted Android devices!
But it would be better to get an alert…
Create a search for all MINT events with status codes other than 200
(hint: we did this earlier)
Once you’ve run the new search, click “Save As” then “Alert”.
- 36. © 2017 SPLUNK INC.
Creating an Alert, cont.
Give the alert a name, and make
it “Real-time”
Make the trigger “Number of
Results” and configure the alert
to trigger if there are more than
five results in five minutes.
Click “Throttle” and set time to
60 seconds
Configure email alert
- 37. © 2017 SPLUNK INC.
Creating a Report
Modify your search to
show the count of
events by status.
On the “Visualization”
tab, choose a “Pie
Chart” for the chart.
When you’ve got your pie
chart working,
click “Save As” and
choose “Report”.
- 38. © 2017 SPLUNK INC.
Creating a Dashboard
From your new saved report,
click “Add to Dashboard”
Create a new dashboard and
give it a name in the pop-up
Click “Edit”, “Add Panel”,
“Clone from Dashboard”, then
choose your new dashboard
and clone the panel.
Edit the search of the new
panel to show count by device,
carrier or platform.
Add more if you have time!
- 39. © 2017 SPLUNK INC.
Using Dashboards
Click on “Dashboards”,
then “Mobile App Health”.
The top row of this
dashboard shows the server
side of our mobile app isn’t
having issues. The middle
row shows counts by
device, carrier, and app
version. The bottom row
shows some performance
metrics.
Use the panel in the lower
left to see the application
issue we diagnosed
earlier.
- 41. © 2017 SPLUNK INC.
What We Hear From Our Customers!
“My CIO is demanding we look at IT from a business service perspective.”
“Splunk is great for break-fix, but I need to show we’re meeting SLAs.”
“I need everyone to be able to see the same thing at the same time.”
“I just want to throw data at Splunk and have it find problems for me.”
“Show me what my data can do for me!”
- 42. © 2017 SPLUNK INC.
Rethinking and Improving How IT Operates
• Structured data
• Brittle tools and integrations
• Obsession with “faults” and “traps”
• Focus on components parts
• Search oriented
• Structured and unstructured data
• Robust data integrations
• Real-time insights from big data
• Focus on the whole service
• Machine learning-driven analytics
Data Driven ITTraditional IT
0101101
0010101
- 43. © 2017 SPLUNK INC.
What Is Service Intelligence?
Enabling a business-aware IT
Measuring and reporting on indicators that matter
Unlocking operational efficiencies
Collaborating across silos to improve service operations
Data-based decision making
Solving problems and anticipating pitfalls with sophisticated
analytics and powerful insights
- 44. © 2017 SPLUNK INC.
Machine learning-powered analytics for real-time service insights,
simplified operations and root-cause isolation
- 45. © 2017 SPLUNK INC.
Splunk IT Service Intelligence
Prioritize incidents
with context
Deliver business &
service context to
prioritize incident
investigation & action
Redefine the
role of IT
Support decisions &
communicate results
with powerful
service-level insights
Simplify service
operations
Leverage machine learning
to detect anomalies &
highlight events that matter
Unify siloed
monitoring
Combine events & metrics
across silos with ease,
flexibility & scale in days
- 47. © 2017 SPLUNK INC.
What’s a Service?
Service
Requests
Responses
In Splunk ITSI, a service is a logical group of technology
components that a user deems need to be monitored together.
It can often be generalized as a “black box” to which we send
requests and expect responses
- 48. © 2017 SPLUNK INC.
What’s a Service?
DNS
Requests
Responses
Technical Services
Auth
Requests
Responses
Web
Requests
Responses
Services can be technology-centric…
- 49. © 2017 SPLUNK INC.
What’s a Service?
DNS
Requests
Responses
Technical Services
Customer
Transactions
Requests
Responses
Business Services
Auth
Requests
Responses
Web
Requests
Responses
Support
Desk
Requests
Responses
… and business-centric
- 50. © 2017 SPLUNK INC.
What’s a Service?
Packet Network
Hypervisor and Hosts
RBMDBs
Storage Tier
API Services
Web Services
CustomerTransactions
Mobile
API/Middleware
PartnerPortal
DNS
Services can encompass multiple tiers of the IT domain
and may also depend upon other services/microservices
- 51. © 2017 SPLUNK INC.
What’s a KPI?
DNS
Requests
Responses
KPI: Number of requests
KPI: Error rate
KPI: Average response time
KPI: Servicer CPU load
KPI: Server network I/F errors
Customer
Transactions
Requests
Responses
KPI: Number of transactions
KPI: Error rate
KPI: Average response time
KPI: Count of Incident Tickets
KPI: Synthetic Transx Health
KPIs and health scores constitute the means by which Services are monitored.
- 52. © 2017 SPLUNK INC.
Key Performance Indicators (KPIs)
KPI: A Splunk saved search defined in Splunk ITSI that helps monitor a specific
field like CPU, Memory and so on. KPIs are contained within services.
- 53. © 2017 SPLUNK INC.
Service Health Scores
A health score is a score from 0-100 that helps determine the health of a service.
It is calculated based on all KPIs importance and its status once every minute.
- 55. © 2017 SPLUNK INC.
Service Analyzer, Glass Tables, Deep Dives
Service Analyzer: Auto generated filterable and tiled view of service health
scores and KPIs
Glass Tables: Customizable free form drawing dashboards to view health scores
and KPIs of choice with visual tools to create context
Deep Dives: Swim lane analysis dashboard to show all those indicators over
time for investigations
- 56. © 2017 SPLUNK INC.
Multi KPI Alerts, Notable Events
Multi KPI Alerts: Correlation searches on service degradation
Notable Events: Event framework for Multi KPI Alerts
- 58. © 2017 SPLUNK INC.
What Makes Splunk ITSI Different!
Search-Based KPIs
• Easy to write, manage and change
both services and KPIs
• Reflects business and technology
priorities
• Benefit: Rapidly generate and
change KPIs to align service health
with business
• Fiserv – 1000s in just weeks
Full Fidelity Service Health
• Adaptable and flexible
definitions of service health
• One solution to go seamlessly
from service reports to root
cause, including raw data
• Remains adaptable and yet still
maintains complete historical
context
Universal Data Platform
• Data driven: All IT data including
events, metrics and logs
• Schema on-the-Fly
• Ask any question of the
data
• Fast time to value
• Data fidelity
- 59. © 2017 SPLUNK INC.
Splunk IT Service Intelligence
Machine Learning
§ Adaptive threshold automation to minimize false alerts
§ Behavior anomaly alerts to proactively address issues
§ Correlating data into knowledge, mitigating SME dependency
§ Accelerators minimize SPL coding
§ Trend aggregation to enable rapid visualization
§ Multi KPI Alerts for proactive irregularity identification
Search-Based KPIs
§ Time Series Index
§ Schema on Read
§ Data Models
Platform for Operational Intelligence
§ Visualize entire tech stack – bare metal through business layer
§ View the entire ecosystem with customized views for execs
§ Use 3 clicks to get the answer vs. 10
Dynamic Service ModelSplunk ITSI
Capabilities
- 61. © 2017 SPLUNK INC.
Why Enterprises Use Splunk for IT Operations
Increased Uptime
to 99.9%
Availability
Reduced MTTR
from 2-3 days to
a few minutes
Improved Margins
by protecting millions
in ad-revenue
Consolidated Tools
by retiring 27
monitoring solutions
Optimized Capacity
by saving $500K in
SW, HW & licenses
Drives Innovation
with usage analytics
on product features
- 62. © 2017 SPLUNK INC.
Unified insights:
data integrations
from other tools
11,000 to 100s
Reduced
incident tickets
Alerting on service
KPI’s instead of server
performance
Usage baselines to
identify anomalies
Splunk IT Service Intelligence at
- 63. © 2017 SPLUNK INC.
Server-based to
Services-based
monitoring
Top-down and
deep-dive service
insights
200+ services and
1500+ KPIs
monitored
Flexible creation and
modification of
services and KPIs
Alerting on
service KPIs
instead of server
performance
Real-time, holistic
and proactive
“client” view
Splunk IT Service Intelligence at
- 64. © 2017 SPLUNK INC.
▶ Real-time service insights to LOBs
▶ Reduced time to resolution
▶ Replaced home-grown tools
Splunk IT Service Intelligence at
- 66. © 2017 SPLUNK INC.
Quick Start for Infrastructure Monitoring
Fast time-to-results and success for a low entry price
Expert Guidance and
Customer Success
Manager
Tailored
Selection of
Apps and Add-
Ons
Education
Credits and .conf
Passes
Add-On
Builder
- 67. © 2017 SPLUNK INC.
Quick Start for Application Management
Fast time-to-results and success for a low entry price
Expert Guidance and
Customer Success
Manager
Tailored
Selection of
Apps and Add-
Ons
Education Credits
and .conf Passes
Stream Add-On
Builder
MINTMachine
Learnin
g
- 68. © 2017 SPLUNK INC.
Splunk Quick Start for Service Intelligence
Enterprise
License
Splunk ITSI
License
Education Professional
Services
.conf
Passes
Value
Assurance
Edition
Services
Edition
Platform
Edition
* Splunk ITSI 6-month license
*
- 69. © 2017 SPLUNK INC.
Splunk is the Backbone of Modern IT
Platform for Machine Data
Troubleshooting
Continuous
Deployment
Application
Management
Service
Monitoring
- 70. © 2017 SPLUNK INC.
AVAILABLE NOW!
Try it: SPLUNK.COM/ITSI
Free. In Splunk Cloud.