Your IT department supports critical business functions, processes and products. You're most effective when your technology initiatives are closely aligned and measured with specific business objectives. This session covers best practices and techniques for designing and building an effective service model, using the domain knowledge of your experts and capturing and reporting on key metrics that everyone can understand. We will design a sample service model and map them to performance indicators to track operational and business objectives. We will also show you how to make Splunk service-ware with Splunk IT Service Intelligence (ITSI).
Human Factors of XR: Using Human Factors to Design XR Systems
How to Design, Build and Map IT and Business Services in Splunk
1. How To Design, Build And Map IT
And Business Services In Splunk
Dan Byrd
ITSI Specialist
2. Agenda
2
Introductions
Splundamentals – IT Troubleshooting with Splunk
What is IT Service Intelligence?
Service Intelligence Design Practices
What our ITSI customers doing? (Sample Glass Tables)
What's Next? (The Glass Table Exercise)
Happy Hour!
3. Safe Harbor Statement
During the course of this presentation, we may make forward looking statements regarding future events
or the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could
differ materially. For important factors that may cause actual results to differ from those contained in our
forward-looking statements, please review our filings with the SEC. The forward-looking statements
made in this presentation are being made as of the time and date of its live presentation. If reviewed
after its live presentation, this presentation may not contain current or accurate information. We do not
assume any obligation to update any forward looking statements we may make. In addition, any
information about our roadmap outlines our general product direction and is subject to change at any
time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described orto includeany suchfeatureor functionalityina futurerelease.
3
4. What is Service Intelligence?
Enabling a business-aware IT
Measuring and reporting on indicators that matter
Unlocking operational efficiencies
Collaborating across silos to improve service operations
Using data-driven decision making
Solving problems and anticipating pitfalls with sophisticated
analytics and powerful insights from machine data
5. Key Takeaways
1 Build on what you are already doing with Splunk
Service Intelligence design and configuration practices
3 What is possible with Splunk IT Service Intelligence
7. Splunk Approach to Machine Data
7
Structured
RDBMS
SQL
Schema on Write
Traditional
ETL
Search
Schema on Read
Splunk
Universal Indexing
Volume Velocity Variety
Unstructured
• Define Static schema
• ETL into Schema
• Enrich at write
• New data = new columns
• New questions = new columns
• “Data at rest” (delayed info)
• Labor Intensive & time consuming
Ideal for Reporting
• “Schema-on-the-Fly”
• Data in native format
• Enrich at read
• New data = no changes needed
• New questions = no changes needed
• “Data in motion” (Real time)
• Fast time to value
Ideal for Investigation
8. 8
Turning Machine Data Into Business Value
Index Untapped Data: Any Source, Type, Volume & any Use Case
Ask Any Question
Application Delivery
Security, Compliance
and Fraud
IT Operations
Business Analytics
Industrial Data and
the Internet of Things
Servers
RFID
Networks
GPS
Location
Packaged
Applications
Messaging
Desktops
Online
Shopping
Cart
Storage
Smartphones
and Devices
Energy
Meters
Web
Clickstreams
Telecom
Databases
Call Detail
Records
Web
Services
Online
Services
On-
Premises
Private
Cloud
Public
Cloud
Security
Custom
Applications
Application Delivery
Security, Compliance
and Fraud
IT Operations
Business Analytics
Industrial Data and
the Internet of Things
9. 9
Turning Machine Data Into Business Value
Ask Any Question
Application Delivery
Security, Compliance
and Fraud
IT Operations
Business Analytics
Industrial Data and
the Internet of Things
Servers
RFID
Networks
GPS
Location
Packaged
Applications
Messaging
Desktops
Online
Shopping
Cart
Storage
Smartphones
and Devices
Energy
Meters
Web
Clickstreams
Telecom
Databases
Call Detail
Records
Web
Services
Online
Services
On-
Premises
Private
Cloud
Public
Cloud
Security
Custom
Applications
Different people,
Asking different questions,
All using the SAME Data
Splunk>
becomes the
Data Fabric Platform
for multiple use cases
Multiple Use Cases
from a Single Data Platform =
Increased value
Lower cost
Improved collaboration
10. 10
Turning Machine Data Into Business Value
Ask Any Question
Application Delivery
Security, Compliance
and Fraud
IT Operations
Business Analytics
Industrial Data and
the Internet of Things
On-
Premises
Private
Cloud
Public
Cloud
Different people,
Asking different questions,
All using the SAME Data
Splunk>
becomes the
Data Fabric Platform
for multiple use cases
11. Why Traditional Approaches Fail
Network
InfrastructureLayer
Storage
Server
74%
-36%
ApplicationLayer
Synthetic APM
Byte Code Instrumentation
Adaptive Thresholding
HP Run-Time Service Model
CA Service Operations Insight
IBM NetCool/Omnibus
Zenoss Service Dynamics
------------------------------------------
Service Model definition
& Correlation Engine
Business Layer
Aggregation/Correlation/Visualization
Service Layer
Challenges
• Too many disparate components
• Difficult to define Service Model
• Labor intensive
• Most implementations fail
• Very important source is
missing! (machine data) or the
‘Why’ for troublshooting
12. Data Approach With Splunk>
Network
InfrastructureLayer
Packet, Payload, Traffic,
Utilization, Perf
Storage
Utilization, Capacity,
Performance
Server
Performance, Usage,
Dependency
74%
-36%
ApplicationLayer
Synthetic APM
Availability, Capacity,
User Experience
Byte Code Instrumentation
Usage, Experience,
Performance, Quality
Adaptive Thresholding
Apps, Services, Systems
Splunk> is the missing link
• Data Fidelity
• Single Repository for ALL data
• Easier to Manage Services
• Reduced Integrations
• Reduced Point Solutions
• Collaborative Approach
• Quick time to value
MACHINE DATA
Data Fabric Platform
Service Intelligence
14. Time Series Index
Schema on Read
Data Model
IT Service Intelligence Value Stack
14
Machine Learning
Adaptive threshold automation to minimize false alerts
Behavior anomaly alerts to proactively address issues
Correlates data into knowledge mitigating SME dependency
Visualizes entire tech stack – bare metal through business layer
View the entire ecosystem with customized views for executives
3 clicks to get the answer versus 10
Dynamic Service Model
Accelerators minimizing SPL coding
Trend aggregation to enable rapid visualization
Multi KPI Alerts for Proactive irregularity identification
Search Based KPIs
Core Splunk
18. What is a Service?
Service
Requests
Responses
In ITSI, a Service is a logical group of technology components that a user
deems need to be monitored together.
It can often be generalized as a “black box” which we send requests, and
expect responses
18
19. What is a Service?
DNS
Requests
Responses
Technical Services
Auth
Requests
Responses
Web
Requests
Responses
Services can be lower level (technical) …
19
20. What is a Service?
DNS
Requests
Responses
Technical Services
Order Entry
Volume
Revenue
Business Services
Auth
Requests
Responses
Web
Requests
Responses
Customer
Care
Requests
SLA Compliance
Services can also be higher level (business) …
20
21. What is a Service?
Packet Network
Hypervisor and Hosts
RBMDBs
Storage Tier
API Services
Web Services
CustomerTransactions
Mobile
API/Middleware
BusinessFunction
DNS
Services can encompass multiple tiers of the IT domain.
Services may also depend upon other services
21
22. What is a KPI?
DNS
KPI: Request volume
KPI: Error rate
KPI: Average response time
KPI: Server CPU load
KPI: Configuration changes
Customer
Transactions
KPI: Transaction volume
KPI: Error rate
KPI: Average response time
KPI: Max response time
KPI: Count of Change records
KPIs and Health scores constitute the means by which
Services are monitored.
22
Business
Function
KPI: Business volume
KPI: Error rate
KPI: Revenue rate
KPI: Conversion rate
KPI: Count of Incident tickets
23. Key Performance Indicators (KPIs)
23
A Key Performance Indicator (KPI) is powered by a Splunk search in ITSI that
monitors a specific attribute like CPU utilization, Response Time, Number of
Errors and so on. KPIs are contained within Services to measure their health.
24. Service Health Scores
24
A Health score is a score form 0-100 (0 being critical and 100 being normal)
that measures the health of a Service. It is calculated based on all KPIs
importance and its status (e.g. green, orange, red), once every minute.
27. Start With A Problem Worth Solving
Review your organization’s critical services
Identify a service that has impactful and measurable
challenges
28. Buttercup Games – How Can We Help?
Manufacturer of toys and games
Desire to improve supply chain efficiency and customer satisfaction
New online store has issues that impact customer experience and revenue
29. The Business Problem for Buttercup Games
Supply
Chain
Limited
Visibility
Frequent
Bottlenecks
ERP
Systems
Business
Impact
$48,000/wk
in revenue
loss
War rooms
32 hrs/wk
??
?
Failed
Interactions
Online
Store
Poor Customer
Satisfaction
30. Bring Subject Experts Together
Identify stakeholders and support personnel for the
selected service
Create awareness and invite their collaboration to solve
the business challenge
31. Design Before Configuring
Identify pains, performance indicators
and measurement goals for the service
Identify components and data
needed to drive service insights
Consolidate the mappings into
an enterprise process/IT services map
32. Service Intelligence Goals for Buttercup Games
Supply
Chain
Limited
Visibility
Frequent
Bottlenecks
ERP
Systems
Business
Impact
$48,000/wk
in revenue
loss
War rooms
32 hrs/wk
??
?
Failed
Interactions
Online
Store
Poor Customer
Satisfaction
GOAL 1
Continuous improvement
through visibility to key
indicators of supply chain
performance
GOAL 2
Increase customer satisfaction and reduce
cost through fewer failures and restoration
activities
33. Service Decomposition – Buttercup Games
Service Layer
Manufacturing Shipping Fulfillment
Supply Chain
Infrastructure Layer
Application Layer
Online Store EDI
Web Tier Middleware
Business Layer
Order Entry
34. Putting It All Together
Infrastructure Layer
Application Layer
Business Layer
Service Layer
Order Entry Manufacturing Shipping Fulfillment
Supply Chain
Online Store EDI
Web Tier Middleware
• Total Orders
• Total Revenue
• Unit Count
• Unit Failures
• Service Level • Delivery Time
• Online Orders
• Online Revenue
• Response Time
• ServiceHealth
• Incidents/Changes
• Customer Satisfaction
• HTTP Hits
• Error Rate
• CPU Load
• Memory Used
• Disk Used
• IO Latency
• CPU Load
• Memory Used
• Disk Used
• IO Latency
• Response Time
• Error Rate
• Response Time
• Storage Free
• Response Time
• Availability
35. Typical Data Sources
Infrastructure Layer
Application Layer
Business Layer
Service Layer
Order Entry Manufacturing Shipping Fulfillment
Supply Chain
Online Store EDI
Web Tier Middleware
• Application Logs
• Corporate Databases
• Service Management
• Application Logs
• Webserver Logs
• DB Perf Counters
• Wire data
• Perf Counters
• Access Logs
• Network Logs
38. Call Center Service
Service Health Transactions
ACD Analysis – Core Splunk
Call Wait History
Inbound Analysis
Social Media
Online Msg
Social Media
Mail SupportVOIP Service
Inbound Calls
39. Online Transactions
Internal Transfer Service
External Wire Service
Money Exchange Service
Money Transfer Services
Service Health Corporate
Reconciliation Service
Fed Exchange Service
Core Splunk Searches
Transaction History
System Investigation
Heat Map Analysis
40. CIO Scorecard
Enterprise Service Status Major Incidents
Service Health
Continuous Operational Visibility
Volume Revenue Incidents Changes
Major Changes
Service Health Volume Revenue Incidents Changes
Service Health Volume Ontime DeliveryIncidents Changes Service Health VolumeRevenue Incidents Changes
Service Health Volume Revenue Incidents Changes Container UtilService Health Throughput Incidents Changes
41. The Vision - Business Operations Center
• Splunk ITSI has the fundamentals to deliver on the promise of real time business visualizations
• Modeled after your Security, Network, and IT Operations Centers
• Monitoring and diagnosis of important ecommerce and brick and mortar operations
• Enhanced with process insight from end-to-end, alerts, machine learning and real-time response
NOC
SOC
BOC
42. Sign Up Now – We’re here to help!
Harness the creativity and domain knowledge of your organization
to unlock the value of data and solve an important Business
Service problem through a joint service intelligence workshop
with key stakeholders
Define methods for:
› Proactive service monitoring
› Reduced risk and failures
› Faster issue resolution
› Increased business performance
What is it?
› 1 Day Onsite Workshop
› Tightly linked with value
› Collaborative approach
› Build your own Glass
Table
Here is a 3.5-hr schedule:
00:00 (slides 1-6) Introductions and Setup
00:07 (slides 7-11) Splundamentals -- Core Splunk in IT Ops
00:15 IT Troubleshooting Demo
00:25 (slides 12-24) Splunk IT Service Intelligence (ITSI)
00:45 ITSI Tour
00:55 BREAK
01:00 (slides 25-39) Service Intelligence Design Practices
01:20 (slides 40-62 / demo) Let's Play – Setting up Service Intelligence
01:55 BREAK
02:00 (slides 62-67 / demo) Let’s Play -Troubleshooting exercise
02:15 (demo) Advanced Exercise #1 - Create a SmartPhone Service
02:15 (demo) explore the 'sourcetype=mint:network' data; what KPIs are possible?
02:20 (demo) Create new service ("SmartPhone"), create 3 KPIs
02:35 BREAK
02:40 Advanced exercises continued
02:40 (demo) Modify 'Online Store' GT to add the new service
02:45 (demo) Advanced Exercise #2 - Show Adaptive Thresholding
02:47 (demo) Advanced Exercise #3 - Show Anomaly Detection
02:50 (slide 70) Summary
02:55 BREAK
03:00 (slides 71-78) What our ITSI customers are doing
03:20 (slide 79-80 + sign up discussion) What is GTE? Ideas for ITSI use cases?
03:25 (slides 81-82) Wrap up
03:30 DONE!
Splunk safe harbor statement.
So what is service intelligence?
Service intelligence is the ability bring business-aware alignment and enablement to IT, which means the ability to monitor business and service activity using metrics and performance indicators that are aligned with strategic goals and objectives
Service intelligence is also the ability to unlock operational efficiencies by answering unanticipated questions by merging, exploring and analyzing data across any data source and breaking down silos
And continually using data and analytics to make fact-based rapid business decisions
1. Just like the work on this slide, Traditional methods are becoming outdated, irrelevant, and used less and less over time.
2. Data analytics and using the power of information effectively, without dilution, with speed, with precision and
This is the “IT Troubleshooting” demo, about 15-20 min
Cover Splunk 101 basics
The rise of big data has forced IT organizations to transition from a focus on structured, relational data, to accommodate unstructured data, driven by the volume, velocity and variety of today’s applications and systems. As the data has changed from structured data to unstructured data, the technology approach needs to change as well.
When you don’t know what data types you’ll need to analyze tomorrow or what questions you need to ask in a week, flexibility becomes a key component of your technology decisions. The ability to index any data type, search across silos and avoid being locked into a rigid schema opens a new world of analytics and business insights to your organization.
Schema at Read – Enables you ask any question of the deal
Search – Enables rapid, iterative exploration of the data along with advanced analytics
Universal Indexing – Enables you to ingest any type of machine data
Horizontal scaling over commodity hardware enables big data analytics
Splunk products are being used for data volumes ranging from gigabytes to hundreds of terabytes per day. Splunk software and cloud services reliably collects and indexes machine data, from a single source to tens of thousands of sources. All in real time. Once data is in Splunk Enterprise, you can search, analyze, report on and share insights form your data. The Splunk Enterprise platform is optimized for real-time, low-latency and interactivity, making it easy to explore, analyze and visualize your data. This is described as Operational Intelligence.
The insights gained from machine data support a number of use cases and can drive value across your organization.
[In North America]
Splunk Cloud is available in North America and offers Splunk Enterprise as a cloud-based service – essentially empowering you with Operational Intelligence without any operational effort.
Splunk products are being used for data volumes ranging from gigabytes to hundreds of terabytes per day. Splunk software and cloud services reliably collects and indexes machine data, from a single source to tens of thousands of sources. All in real time. Once data is in Splunk Enterprise, you can search, analyze, report on and share insights form your data. The Splunk Enterprise platform is optimized for real-time, low-latency and interactivity, making it easy to explore, analyze and visualize your data. This is described as Operational Intelligence.
The insights gained from machine data support a number of use cases and can drive value across your organization.
[In North America]
Splunk Cloud is available in North America and offers Splunk Enterprise as a cloud-based service – essentially empowering you with Operational Intelligence without any operational effort.
Splunk products are being used for data volumes ranging from gigabytes to hundreds of terabytes per day. Splunk software and cloud services reliably collects and indexes machine data, from a single source to tens of thousands of sources. All in real time. Once data is in Splunk Enterprise, you can search, analyze, report on and share insights form your data. The Splunk Enterprise platform is optimized for real-time, low-latency and interactivity, making it easy to explore, analyze and visualize your data. This is described as Operational Intelligence.
The insights gained from machine data support a number of use cases and can drive value across your organization.
[In North America]
Splunk Cloud is available in North America and offers Splunk Enterprise as a cloud-based service – essentially empowering you with Operational Intelligence without any operational effort.
Challenges –
To many components – Most organizations have 3-5 tools for each of these areas – Do the integration math n(n-1) / 2 (Ex: 3 tools each area * 6 areas = 15 (14/2) = 105 Integration points)
Difficult to define service – Must manual define service; must change definition if point solutions shift or information drifts
Must have expertise in each of the tool areas to understand the metrics and events generation, how to federation, reconcile and analyze
What about the machine data? What if important information is lost in transaltion – what we refer to as Data Fidelity.
That brings us to Splunk IT Service Intelligence – a packaged solution that enables real-time visibility into services driven by machine data.
Splunk ITSI speeds and simplifies service monitoring and analytics and enables IT to make better, smarter and informed business decisions.
This solution allows you to gain a deep understanding of your services. With Splunk ITSI, you have real-time views into the health of your services, and can use advanced analytics to find patterns, detect anomalies and trends to proactively monitor and address issues.
As a result you have improved service visibility, reduced resolution times, and a transformative approach to monitoring and analytics driven by machine-data.
FOR THE PRESENTER: This entire Tour section should last no more than 10 min. Describe how GTs can show KPIs & health scores to any audience/group/team:
Show GTs:
Buttercup Games Business Process (executives, business service owners)
On Line Transaction Service (NOC, Tier2); “can use visio diagrams…”
Buttercup Games Online Store (service flow, sub-services)
Show saved Deep Dive “DB Deep Dive”; BRIEFLY describe DD functionality (you will be able to go into more depth later)
Show Notable Event Review, BRIEFLY describe (you will be able to go into more depth later)
Show Service Analyzer, briefly describe
Ask if the students have questions.
Unlike the traditional approaches, starting with the business services eliminates irrelevant and distracting data by surgically focusing on critical elements important to the business. The benefits are immediate since we are:
1) Focusing on the service in the organization that is impactful and requires oversight. Think about a service in your organization that supports the business, it may maintain any number of metrics and can be measured to understand past, currently, and future performance
2) Bringing together subject matter experts who own and support the the service, exposing the tribal knowledge important for increased IT efficiency and leveraging the institutional knowledge within the organization and baking that experience into monitoring the service so that the next person you onboard or assign the monitoring task to doesn’t have to wake you in the middle of the night to determine what is going on.
3) Designing the service model and simplifying the metrics that are important in calculating overall health of a service and their supporting entities by only consuming metrics needed to support the service. Whether it is; a database, middleware, servers, networking, storage or even social media versus discovering and reconciling all of the information in a repeated and laborious exercise.
Is it impactful, valuable measurable
Drive decision making with quantifiable measurements
How do you drive decisions to meet business needs
What are the top business services in your enterprise?
How do you measure the customer experience with these services?
Are customers happy with their experience?
Everyone at Splunk loves Buttercup our mascot. Buttercup Games manufactures stuffed toys and games. Let’s do a role play that uncovers the services important to the company and where there are problems worth solving.
As a manufacturing company, the supply chain is extremely important. It’s a system that allows us to track the flow of good. So, making sure that
How often do customers experience issues with the service?
When issues arise, who gets involved in resolving them?
How do teams work together to resolve issues?
Evaluate the performance of a process or a service – the measurements can be based upon the effectiveness (business value derived) or efficiency (how quickly the service is delivered)
Identify pains, performance indicators and measurement goals for the service
Develop an end-to-end map of the services
What components do we need to include in the service; db, middleware
What data is needed to drive the metrics
Meet with business leaders, and their teams, to review the consolidated mapping and modify as necessary
For the Presenter: After describing what a Glass Table Exercise is, and how they could benefit from one, ask for use cases that they think could benefit from ITSI. Try to get a room discussion going.