2. Setup Before You Can Play
Download this presentation slide deck: https://splunk.box.com/ITSI-HandsOn
Follow the instructions on your paper hand-out to log in to your VM.
Please log in as either
• user1@buttercupgms.com OR
• user2@buttercupgms.com
• Password is “Changeme1” or
“Changeme2”
After logging in, select
IT Service Intelligence from the
list of apps at the left
2
4. What is a Service?
Service
Requests
Responses
In ITSI, a Service is a logical group of technology components that a user
deems need to be monitored together.
It can often be generalized as a “black box” which we send requests, and
expect responses
4
5. What is a Service?
DNS
Requests
Responses
Technical Services
Auth
Requests
Responses
Web
Requests
Responses
Services can be lower level (technical) …
5
6. What is a Service?
DNS
Requests
Responses
Technical Services
Customer
Transactions
Requests
Responses
Business Services
Auth
Requests
Responses
Web
Requests
Responses
Support Desk
Requests
Responses
Services can also be higher level (business) …
6
7. What is a Service?
Packet Network
Hypervisor and Hosts
RBMDBs
Storage Tier
API Services
Web Services
CustomerTransactions
Mobile
API/Middleware
PartnerPortal
DNS
Services can encompass multiple tiers of the IT domain.
Services may also depend upon other services
7
8. What is a KPI?
DNS
Requests
Responses
KPI: Number of requests
KPI: Error rate
KPI: Average response time
KPI: Servicer CPU load
KPI: Server network I/F errors
Customer
Transactions
Requests
Responses
KPI: Number of transactions
KPI: Error rate
KPI: Average response time
KPI: Count of Incident Tickets
KPI: Synthetic Transx Health
KPIs and Health scores constitute the means by which
Services are monitored.
8
9. Key Performance Indicators (KPIs)
9
A Key Performance Indicator (KPI) is a Splunk saved search created within the
ITSI UI that helps monitor a specific field like CPU, Memory, Number of Errors
and so on. KPIs are contained within Services.
10. Service Health Scores
10
A Health score is a score form 0-100 (0 being critical and 100 being normal)
that helps determine the health of a Service. It is calculated based on all KPIs
importance and its status (e.g. green, orange, red), once every minute.
16. Service Decomposition (Refresher)
16
1 - What is a high-value business service? (Online Store)
2- Process flow, and underlying sub-services?
(Web -> Middleware -> DB -> Middleware -> Web)
17. Service Decomposition (Refresher)
17
1 - What is a high-value business service? (Online Store)
2- Process flow, and underlying sub-services? (Web -> Middleware …)
3- For each (sub)service: KPIs to show health & status?
(Database: errors, SQL hits, response time, …)
18. Service Decomposition (Refresher)
18
1 - What is a high-value business service? (Online Store)
2- Process flow & underlying sub-services? (Web -> Middleware …)
3- For each (sub)service: KPIs? (Database: errors, SQL hits, …)
4- For each KPI: Need a Splunk search
(index=DB (warn* OR error*) | stats count)
19. Service Decomposition (Refresher)
19
1 - What is a high-value business service? (Online Store)
2- Process flow & underlying sub-services? (Web -> Middleware …)
3- For each (sub)service: KPIs? (Database: errors, SQL hits, …)
4- For each KPI: Need a Splunk search (index=DB (warn* OR error*) | stats count)
22. New Requirements!
22
● Create a new KPI for the DB Service:
● Network Utilization
● Modify the Executive Glass Table
in order to show off the services
you slave over
“WE only have about 15min
TO DO WHAT ???!!???”
Think about how long this
would take you today?
24. Let’s Talk Entities
24
● Select DB Service
● Entities are the relevant things which support
this service (usually hosts)
● Select the right entries with filters, ANDs, ORs
● Original Entity list can come from CMDB,
spreadsheet, Splunk search, others
25. A KPI in 5 minutes? Absolutely!
25
Click New – Generic KPI
Select Data Model
● Host Operating System
● Network
● # bytes
● Next
26. KPI Continued….
26
Splunk Builds Searches for you –
Oh Yeah, that’s happening
● Select Yes for Split by & Filter options
● Select host for Entity Lookup & Alias options
● Click Next
27. Almost There…
27
Select
● KPI Search Schedule: Every Minute
● Entity Calculation: Average
● Service/Agg Calculation: Average
● Calculation Window: Last Minute
● Next
● Unit: Bps
● Next
28. Final Steps …
28
Set your thresholds
● Aggregate (All)
● Per Entity
● Click “Add Threshold” TWICE
● Make the Neapolitan ice cream colors
Yellow, Green, Yellow
● Drag the sliders around in order to get
the current data graph entirely inside the
Green (normal) band
● Finish
● Other options are also available,
including adaptive thresholds and
anomaly detection
32. Anomaly Detection
32
● Machine Learning
● Works well for data with patterns
● Requires some “training” (trial & error)
to zero in on best sensitivity
● More sophisticated capabilities coming!
(multivariate, more algorithms, etc)
33. Name that KPI!
33
From the list of KPIs, select your new one (at the bottom)
● Click on the little pencil next to the name
● Call it “Network Utilization”,
with your username up front
● Click on Save at bottom right when finished!
35. Clone the Glass Table
35
Return to Saved Glass Tables page
(click on Glass Tables in the upper menu bar)
CLICK Edit for “Buttercup Games Business Process”
• Select Clone
• Title: Add your username
to the front
• Permissions: Shared in App
• Clone Page
• Click on your new Glass Table
from the list, to view it
36. Edit & Have Fun!
36
Click on Edit in the upper right corner of your Glass Table
Use the “Services” panel on the left to select Individual KPIs,
or Aggregate Service Health Scores
• Choose 2 KPIs from Online Store that would be useful in
the “Order Process” section
• Drag the selected widgets onto the canvas, positioning in
the gray oval
• What’s the difference between the
and tools at the top left?
37. More Fun with the Glass Table Editor…
37
Use the Configurations panel on the right to edit a
selected widget
• Can change the visualization type, drilldown
behavior, and other settings
• You should hit Save frequently
• I wonder what Auto Layout does?
• (YIKES!) Revert All Changes might be helpful
38. Finishing up …
38
• Add a ServiceHealthScore widget for Online
Store under Buttercup
• Choose a Viz Type with a sparkline graph, then
resize to make it look pretty
• Modify the Custom Drilldown action to go to
the saved glass table,
Buttercup Games Online Store
• Bonus Points: Make the label bigger, more
readable
• Save
• View when done
39. A Troubleshooting Exercise
39
Let’s use ITSI to troubleshoot an outage
● Start at your Glass Table, “<UserName> Buttercup Business Process”
● Customer Care reports that unhappy customers are complaining of failures
and long delays when trying to purchase
● The calls began coming in at around 40 minutes after the (previous) hour.
● In the upper right corner of the Glass Table, change the time picker from Now
to XX:40:00.0, where XX is the previous hour. For example, if it is currently
14:05, set the time picker to 13:40:00.0, then Apply
● This is how we can “time travel” back to see conditions at a particular
outage– oh yeah!
40. A Troubleshooting Exercise, cont’d
40
● The Online Store seems to be degraded, just as Customer Care reported.
Click on the widget under Buttercup to drill down further
41. A Troubleshooting Exercise, cont’d
41
● The Online Store Glass Table shows a much more detailed view, including the impacted customer-facing KPIs
at the far left (Revenue, etc)
● Based on this view of all the relevant
services, where do you think the root cause
lies?
● Which service should we troubleshoot first?
● Click on Health widget for that service, to
drill down to a Deep Dive
42. Deep Dive
42
● Deep Dive shows multiple KPIs and Health Scores in parallel “swim
lanes”. The initial time span shown is 15 minutes.
● The Health Score for this DB Service is the top swim lane. Can you
see when it begins to degrade from 100%?
● Mousing over this point in time, can you spot the KPI with the
leading fault indication? I.e., what busted first?
● To improve readability, change the Primary
Time Range (lower left corner) to
Presets > Last 60 minutes
43. Multi-KPI Alerts and Notable Events
43
● Click on Notable Events Review
● Multiple KPIs and Healthscores can
be combined in sophisticated ways
to create Multi-KPI alerts
● When a Multi-KPI alert fires, one
of the outcomes is the creation of
a Notable Event
● Notable Events allow NOC
personnel and others to triage and
coordinate event management
efforts
44. Service Analyzer
44
● Click on Service Analyzer > Default Service Analyzer
● Back where we started!
● This view shows a “no-frills” list of
services (top) and hottest KPIs
(bottom)
● Provides a quick jumping off point
into Deep Dives and the Notable
Events Review
● It is useful for NOCs and others
who need a high-level situational
view
45. Review
45
● High-value services can be decomposed and modeled in ITSI, using machine data
from the relevant systems
● Services and KPIs can be created in minutes, with sophisticated thresholding
techniques to distinguish “normal” from “not normal”
● Glass Tables allow service health and KPI metrics to be displayed in a way that
makes sense to specific groups, such as Executive Leadership, Business Service
Owners, the NOC, DevOps & Others
● Deep Dives allow KPIs to be compared side-by-side across any time range,
accelerating root cause analysis and significantly reducing MTTR
● Multi-KPI Alerts and Notable Events reduce alert noise, producing actionable
events and a means to manage them
● … and it’s fun to build!
46. PLAY TIME IS OVER!
Everyone out of the sandbox!
46
NOT! You can have your very own 15-day free eval sandbox,
to continue playing:
● http://splunk.com/ITSI Then select:
And a Guidebook to help you explore ITSI’s capabilities:
● https://splunk.box.com/ITSI-Sandbox-Guidebook
47. 47
SEPT 26-29, 2016
WALT DISNEY WORLD, ORLANDO
SWAN AND DOLPHIN RESORTS
• 5000+ IT & Business Professionals
• 3 days of technical content
• 165+ sessions
• 80+ Customer Speakers
• 35+ Apps in Splunk Apps Showcase
• 75+ Technology Partners
• 1:1 networking: Ask The Experts and Security
Experts, Birds of a Feather and Chalk Talks
• NEW hands-on labs!
• Expanded show floor, Dashboards Control
Room & Clinic, and MORE!
The 7th Annual Splunk Worldwide Users’ Conference
PLUS Splunk University
• Three days: Sept 24-26, 2016
• Get Splunk Certified for FREE!
• Get CPE credits for CISSP, CAP, SSCP
• Save thousands on Splunk education!
FOR THE PRESENTER: With only 60 minutes available, time is critical in this presentation. With multiple users all attempting to access remote ITSI instances simultaneously, delays, problems and questions are likely to arise at almost ANY POINT during this presentation. You must be ready to fill “wait time” at any point; know which topics you can pivot to– even if they’re slightly out of sequence.
This workshop requires that the presenter be able to deftly toggle quickly between slides and browser– often. This is even trickier when using “full screen” mode for the browser and slides. For this reason, I recommend using a PDF version of the slides (rather than PowerPoint), since Acrobat is simpler to operate, especially when connected to an external projector. However you choose to display the slides and browser stuff, you should probably practice toggling quickly between the two.
You should have a timer visible (to you only), counting down from 60 min, to help with pacing.
PREP: Recruit some Splunk/ITSI technical helpers, available to run around the room and assist with problems and student issues.
Laura Snow can assist with spinning up the VMs ahead of time, with the proper ITSI “hands-on” package installed. Two students per VM works comfortably, though more users per VM could be tolerated. RECOMMEND: configure four user accounts on each VM, in case you have more students than expected, and have to “double up”.
Find a way to print out the VM IP addresses and usernames, and hand out to the students as they enter the room. Since the VMs may be spun up “last minute” on the morning of the SplunkLive, you should plan how you’re going to acquire the addresses at that time, create print-outs for the students, and print them out at the hotel/venue.
FOR THE PRESENTER: This slide is a good one to leave up initially, as the students file in to the room. Encourage the students to log in to their assigned VMs early, as they’re filing in to the room. Encourage students to download the presentation deck, so that they can follow along locally, and can “catch up” if necessary.
This exercise will require students to toggle between looking up at the Big Screen, and down at their own screen. You, the presenter, should be as clear as possible as to what “mode” the students should be in. I.e., “Please look up here for a few minutes”, or “you will be working with your own instance for the next section…”, etc.
This deck provides explicit, click-by-click instructions on how to do the various exercises. During those sections, I recommend that you NOT use the slides– instead, toggle to your own live Splunk instance and perform the exercise as the students should be doing, while talking through your actions, pausing on menu pull-downs, etc. Slow down!
FOR THE PRESENTER: Check your audience to find out how many have seen ITSI, and indeed, how many have even seen core Splunk before. The more newbies you have, the more in-depth you should cover core concepts. This deck does not cover core Splunk concepts, but you may have to do so on your own, possibly.
Although these concepts are important for the students to be able to understand the later exercises, do not spend too much time in this section.
FOR THE PRESENTER: This entire Tour section should last no more than 10 min. Describe how GTs can show KPIs & health scores to any audience/group/team:
Show GTs:
Buttercup Games Business Process (executives, business service owners)
On Line Transaction Service (NOC, Tier2); “can use visio diagrams…”
Buttercup Games Online Store (service flow, sub-services)
Show saved Deep Dive “DB Deep Dive”; BRIEFLY describe DD functionality (you will be able to go into more depth later)
Show Notable Event Review, BRIEFLY describe (you will be able to go into more depth later)
Show Service Analyzer, briefly describe
Ask if the students have questions.
FOR THE PRESENTER: The next three slides set the students up for decomp discussion, later.
GOALS: Get the students to open two specific GTs in separate browser tabs.
These actions set the student up for decomp discussion, later
These actions set the student up for decomp discussion, later
“While you continue to open those Glass Tables in separate browser tabs, let’s review ‘service decomposition’, discussed in the earlier session…”
The next five slides tie the theoretical service decomp exercise into real-world; how do you do “service decomp” in ITSI?
Our chosen “high-value” service is “Online Store”.
This process should be undertaken by BOTH business service people AND technical IT people– working together. ITSI is has the rare ability to bridge the chasm which often exists between “Business Types” and “Technical Types”. It is critical that high-value business services (the ones which affect revenue, customer satisfaction, SLA performance, etc) be identified by the Business Types, along with “interesting” KPIs such as revenue, and that the relevant technical services (and KPIs) be identified by the Technical Types. Because ITSI provides flexibility on how these services and KPIs are defined, it is possible to satisfy BOTH Business AND Technical Types. A miracle!
Within our chosen high-level service (Online Store), what are the relevant sub-services, and how does the process flow?
For a given sub-service, such as “Database”, what are some useful KPIs which would describe its health, status and performance? These KPI metrics are based on Splunk searches, so they can be almost anything. Be creative!
For a particular KPI, what is the Splunk search to generate the KPI metrics? The example here could be used for the Database KPI, “DB errors”.
In the “real world”, it will probably be necessary to iterate up & down these steps a few times.
For example, what if a KPI requires data which is not being collected by Splunk?
TO STUDENTS: You have this glass table on your own system.
This Glass Table shows the high-level business process for Buttercup Games.
Does anyone notice anything missing? (no info in Order Entry)
We need better visibility into our Online Store, which is part of the Order Entry process.
TO STUDENTS: You have this glass table on your own system.
This Glass Table shows a more detailed process flow for the Online Store service.
Notice the sub-services which make up our Online Store service, and how the process flows.
Based on a recent DB outage which was caused by a saturated network interface, we’ve decided that network utilization would be a handy KPI for our Database Service.
We’re also going to tweak the high-level Business Process Glass Table to provide more visibility into the Online Store service.
And we’re going to do it in 15 minutes!
FOR THE PRESENTER: Remind the students that they can refer to their own locally-downloaded slides for “click-by-click” reference for the process of adding a KPI.
Then switch to your own browser and demonstrate these steps “live”.
Have fun with the concept that a roomful of people can build a new KPI in only a few minutes, and that “the clock is ticking”.
FOR THE PRESENTER: SHORT discussion of entities
FOR THE PRESENTER: Briefly cover “data model” vs “ad hoc search”. Don’t spend a lot of time here.
FOR THE PRESENTER: Briefly cover the concepts on this page, and point how the “Generated Search” window at the bottom, and how cool it is that Splunk builds the search for you; does anyone in the audience have users who could benefit from this?
QUICK TANGENT: In the typical working environment, which often has a chasm between the “Business Types” and the “Tech Types”, how long would it take to map services to actual infrastructure? "Many quarters, and possibly a year-- on the conservative side, right?"
To quantify that, by show of hands, has anyone here been involved in an IT Service Management / Business Management team trying to map every server to a service or business function? Did you sustain any long-term injuries?
And even IF you are successful in this effort, as soon as you finish you have to start over.
ITSI is remarkable because it can allow the Business teams and Technical teams to map out the important services realistically and effectively– in DAYS and WEEKS. We offer a Glass Table Workshop to facilitate such an exercise on YOUR services and YOUR data– in a single day.
Keep moving…
FOR THE PRESENTER: This might take a while for “waiting for data” to produce an actual graph for the students (1-2 minutes, typically).
Instruct the students that if will take a couple of minutes for the data to appear, and to not click on anything in the meantime. Then skip to the Adaptive Thresholds and Anomaly Detection slides and discussion, while the students wait.
Afterwards, can be helpful to gauge progress by asking for a show of hands to see how many students are still waiting. If necessary, simply show the students how to set thresholds (on your own browser), then move forward.
FOR THE PRESENTER: Talk through-- NOT HANDS ON
FOR THE PRESENTER: Talk through-- NOT HANDS ON
FOR THE PRESENTER: Talk through-- NOT HANDS ON
Talk through NOT WORK
How long did it take to make this KPI?
We’ve already discussed the high-level business process for Buttercup Games. We need better visibility into our Online Store, which is part of the Order Entry process.
FOR THE PRESENTER: As before, switch to your own browser and demonstrate these steps “live”.
Have fun with the concept of saving a copy before editing– so that you don’t muck it up.
FOR THE PRESENTER: Have fun with this GT editor section. The GT editor is a bit twitchy, so exploit the humor and have fun with the students.
GOALS (for the next 3 slides):
Identify 2 “interesting/useful” KPIs from the Online Store service, to position in the gray “Order Entry” oval; let the students choose details and viz types
Put a ServiceHealthScore widget (from Online Store) under the pony, to show overall health of the service. Modify “custom drilldown” to land on the “Buttercup Games Online Store” GT
Encourage the students to use text boxes and other techniques to make the widget more readable, prettier to look at
Remind the students that “the boss’ boss” will be looking at this GT, and we want to make sure that they’ve got good visibility into “our” service (Online Store).
FOR THE PRESENTER: If you use the “Auto Layout” gag (i.e., hinting that the students should click on this, resulting in total destruction of their GT), MAKE SURE that everyone has SAVED before doing so. This gag can be fun, especially pointing out how deceitful/evil the instructor is.
FOR THE PRESENTER: When finished (after everyone have hit ‘Save’ and ‘View’, and are looking at their own beautiful GTs):
How long did it take to create a new KPI and make major changes to a Glass Table? Pretty cool!
Ask the students if this (ITSI) could be useful in their own environments
If you have more than 15 min of remaining time, speak through some actual (referenceable) customer ITSI use cases.
FOR THE PRESENTER: This hands-on section can be very powerful for the students. This allows them to “put it all together”, driving ITSI with their own fingers.
As before, switch to your own browser and demonstrate these troubleshooting steps “live”. The corresponding slides are intended as reference for the students.
If pressed for time (i.e., less than about 10 min), talk through and show this process– but don’t have the students attempt to “click along” in real time.
If pressed for time, talk through and show this process– but don’t have the students attempt to “click along” in real time
Note that this “drill down” has inherited the same time selection (i.e., an earlier outage)– pretty cool!
FOR THE PRESENTER: The major points here:
During the heat of battle, when troubleshooting an outage, being able to visualize the entire service flow is extremely valuable
By being able to see health status of all the underlying services, we can quickly choose where and how best to proceed.
Potentially huge time savings– customers report major reductions in MTTR
FOR THE PRESENTER: This is a good “variable time” section. You can spend as little or as much time as you choose, depending on how much time you have remaining in the session.
Remind the students that they will have more time to play with DD later (yes, they might be confused by this, since only a few minutes remain in the session)
FOR THE PRESENTER: This is another good “variable time” section. You can spend as little or as much time as you choose, depending on how much time you have remaining in the session.
Remind the students that they will have more time to play with this stuff later (yes, they might be confused by this, since only a few minutes remain in the session)
Look! Students have more time to play in their own sandbox environment, after all.
We’re headed to the East Coast!
2 inspired Keynotes – General Session and Security Keynote + Super Sessions with Splunk Leadership in Cloud, IT Ops, Security and Business Analytics!
165+ Breakout sessions addressing all areas and levels of Operational Intelligence – IT, Business Analytics, Mobile, Cloud, IoT, Security…and MORE!
30+ hours of invaluable networking time with industry thought leaders, technologists, and other Splunk Ninjas and Champions waiting to share their business wins with you!
Join the 50%+ of Fortune 100 companies who attended .conf2015 to get hands on with Splunk. You’ll be surrounded by thousands of other like-minded individuals who are ready to share exciting and cutting edge use cases and best practices. You can also deep dive on all things Splunk products together with your favorite Splunkers.
Head back to your company with both practical and inspired new uses for Splunk, ready to unlock the unimaginable power of your data! Arrive in Orlando a Splunk user, leave Orlando a Splunk Ninja!
REGISTRATION OPENS IN MARCH 2016 – STAY TUNED FOR NEWS ON OUR BEST REGISTRATION RATES – COMING SOON!