Managing IT Operations is a challenging job that’s only getting harder. Humans can no longer effectively process the volumes of event data intended to help identify and remediate IT issues. So what’s an enterprise to do?
This fundamental question leads to another: is your legacy event management system still up to the job? For most enterprises, their legacy tool is based on technology that still relies on RULES.
KeyBank and Moogsoft describe the technical limitations of rules-based solutions, and how AIOps solutions represent the intelligent automation of the future. They also cover:
* How to move your monitoring regime from Reactive to Proactive to Predictive
* How AIOps can support the delivery of a great Customer Experience (Cx)
* The KeyBank story of AIOps adoption.
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Management
1. Webinar | June 24, 2019
Adam Frank
Senior Product Manager
Mick Miller
Senior DevOps Architect
How KeyBank Liberated its IT Ops
from Rules-Based Event Management
6. What is Driving
Change Velocity?
• Expansion of digital services
• Emergence of containers
• High availability architectures
• Volume: 100k+ and above logins per
sec, etc.
Increased monitoring breaks down
legacy approaches…
• Increasing staff does not scale with the
rate of data ingestion
• Legacy systems do not learn
7. Keeping
Customers…
…and attracting new ones through
improved customer experience (Cx)
• Near 100% uptime has become
expected
• Restoration of services is measured
in seconds not hours
• Capturing click-level events to
discover how customers are using
your systems
• Continuous delivery
8. The Weight
• Legacy rules-based filtering (if,
then, else, etc.) won’t scale with
exponential growth
• Too many interdependencies
between complex systems and
rules supporting the telemetry
Legacy Monitoring Can’t Scale
9. Obsolescence: Planned
and Unplanned
• Software/Hardware: at the core of
ideas, which change as we advance
information/data/technology
• Languages: Over 25 languages in 60
years (1948–2009)
• Data: Flat files -> ISAM -> Relational ->
No-SQL -> Clusters -> etc.
• Software : ad-hoc -> Structured
programming -> Object -> Functional -
> etc.
• IT Operations: ad hoc -> ITIL v1-3 ->
ITIL v4 -> DevOps -> etc.
• And on, and on …
11. New Strategy Required for IT System
Monitoring
Graph based on StackState monitoring maturity model for IT operations
visibilityandintelligence level 1
individual
component
monitoring
level 2
full-breadth
monitoring
level 3
end-to-end
monitoring and
correlation
level 4
AIOps
maturity level
reactive
monitoring
proactive
monitoring
predictive
monitoring
Rules-based
AIOps
12. • As the number of systems increases,
so does the volume of data. This
means the number of rules will
increase causing exponential
complexity.
• Increased number of rules becomes
unpredictable and untestable.
This Parrot Is
No More
Rules-based event correlation is
past it’s time
13. This Parrot Is No More
• Multiple rules interacting is a factorial problem:
(n! = n × (n−1)!)
o 5! rules = 120 possible combinations
o 6! rules = 740 possible combinations
o 10! rules = 3,628,800 possible combinations
o 100! rules = 9 x 10157 power
(9 followed by 157 zeros) possible
combinations
• While easy to understand and implement,
rules-based monitoring implodes at the enterprise
scale as complexity increases
Rules-based event correlation is past it’s time
14. n n!
0 1
1 1
2 2
3 6
4 24
5 120
6 702
7 5,040
8 40,320
9 362,880
10 3,628,800
11 39,916,800
12 479,001,600
13 6,277,020,800
n n!
14 87,178,291,200
15 1,307,674,368,000
16 20,922,789,888,000
17 355,687,428,096,000
18 6,402,373,705,728,000
19 121,645,100,408,832,000
20 2,432,902,008,176,640,000
21 51,090,942,171,709,440,000
22 1,124,000,727,777,607,680,000
23 25,852,016,738,884,976,640,000
24 620,448,401,733,239,439,360,000
25 15,511,210,043,330,985,984,000,000
26 403,291,461,126,605,635,584,000,000
27 10,888,869,450,418,352,160,768,000,000
Relationship Between Rules Growth Is
Not Linear
• Trying to understand and
test all the relationships
between rules is not
possible.
• Data scientists call this the
“NP-complete” problem
(not solvable with current
compute capability)
• Virtually impossible to
understand effect of alert
exceptions in a collection
of rules, even at 10 rules.
15. You don’t know
what you don’t
know
• You can’t predict unusual events (events
not caught or missed by rules)
• Rules-based approaches need to change
to AIOps
• ML and AI: all event data can be
processed
• Modern AIOps uses algorithms to identify
when something is unusual
In data science a “black swan event” is
something you can’t predict.
16. Whodunit?
• Rules-based approaches cannot
decide on root cause of system
failures
• Random nature of real-world failures
in highly distributed systems can
have multiple root causes
• Unlike rules-based systems, AIOps
have built-in learning models. You
don’t need to constantly add new
rules
Root cause probability
17. Take the red pill…
• Deceptively simple
• Expensive
• Unpredictable
• Undecidable
Rules-based systems cannot meet
the demands of complex
distributed computing
18. Take the red pill…
• Processes all events
• Does not separate data from systems
• Algorithms are deterministic
• Algorithms don’t care about order
• Single algorithm can replace
hundreds of rules
AIOps (AI and ML ) liberates IT
from the limitations of rules-
based systems
19. Start Today
• Now is the time to start your AIOps journey
• Move beyond legacy rules-based systems
• Start using modern machine-learning of
AIOps to deliver continuous service
assurance toyour enterprise
Get Started by Reading the AIOps Manifesto:
• https://www.aiops-exchange.org/wp-
content/uploads/2019/05/aiops-manifesto.pdf
• https://www.moogsoft.com/resources/aiops/e
book/aiops-liberates-it
21. Continuous Service Delivery, Optimal Business Agility
TIME
Quickly focus on and
resolve the most critical
issues, at scale
Improve your economics by making
teams faster, smarter, and more
productive
COST
Get real-time visibility into your
existing data sources, tools and
workflow
RISK
ALL
DATA
Any
SCALE
Purpose-Built AI for IT and DevOps
22. Moogsoft Is the Platform for Agile and Proactive Event
Resolution Workflow
Industrialized data
ingestion from
multiple sources
Proactively and
automatically detects
Incidents and probable root
causes (reduces MTTD)
Triggers automation
to restore services
Predictive insights
(reduces support
escalations and
MTTR)
Enables collaborative
workflows (reduces
MTTR and adverse
business impact)
Automatically resolves
signals from alert
noise
Early Detection, fewer tickets, reduced MTTR
AI
AI
AI
AI