© 2018 IBM Corporation
Cloud Service Management and why Machine
Learning is now essential
June 21 2018
© 2018 IBM Corporation
Agenda
Part 1: - Cloud Service Management
Part 2: - Machine Learning is essential (for adaptive automation)
Part 3: - Wrap-up, Call to action & Q&A
Cloud enables digital
transformation
To transform, organizations are
employing App Modernization, Hybrid,
and DevOps
Supporting agility at scale requires
managing increasing data growth,
complexity, and dynamic
environments
Deliver, reliable, competitive applications Fast
Business Reality & needs:
• Agile Application Delivery
• End user experience & reliability
• Lean Operations Management
Ops Goal: Fewer problem tickets, faster
resolution
Dev Goal: Faster time to market,
reduce disruptions
Dev Test Stage Prod
Sto
p
Shift Right Shift Left
virtually every application & service
will incorporate AI, Gartner10yrs
of the top 20 companies in
every industry will be
disrupted in the next 3 years1/3 of apps must be refactored
to move to cloud99%
LoB Executive
Application Owner
Application Developer
Chief Information Officer
IT Operations Manager
IT Operations Engineer
Business Imperatives are Driving Faster Change
Agility depends on DevOps practices and Cloud-Enabled Process Innovation
Systems of Record
Operational Excellence
Systems of Engagement
Transformation & Differentiation
Agile Management
Traditional Management
Traditional Model Agile Model
Some, big IT projects Many, small
2-3 years Time to go live 2-3 months
Lower Change rate Higher
Centralized Governance Decentralized
Cloud-ready, on-prem Tools Cloud-Native
ITIL, CMMI Processes DevOps, Lean
Hybrid Ops
Hybrid Apps
Source: The agile CIO: Mastering digital disruption. http://blog.kpmg.ch/the-agile-cio-mastering-digital-disruption/
5
Process,
Tools and
Culture
Growing an Agile
organization
requires adaptation
across the
organization
.
Process
Tools
&
Technology
Culture
• Adjust processes to enable Agility
• Continued High Availability and Performance
• Built-to-Manage Approach
• Integrate Cloud Service Management toolchain
with existing ITSM capabilities
• Implement New Tools
(ChatOps, Runbook Automation, etc.)
• Orient on Application Agility and shared
success (DevOps)
• Transition to New Roles (i.e. Site
Reliability Engineer, First Responder)
• Transition to Proactive monitoring
(Analytics)
6
Enterprise DevOps Adoption
“The Future is already here, it is just unevenly distributed” – William Gibson
7
New
DevOps
Startup
• Full Stack Engineers
• Highly Collaborative
• Informal and Agile
• Focused and Independent
Enterprise Business Reality:
• Some Agile Applications
• Some Legacy Applications
• Adopting Cloud Operating Model
• Mix of Traditional and Cloud
IT Service Management (ITIL)
• Process Oriented
• Resistant to Change
Cloud Service Management
• Service Oriented
• Dynamic and Agile
L1 Ops
L2 Ops
SME
Site Reliability
Engineer
First Responder
DevOps/SME
Hybrid Cloud Management enables the
transformation journey
Theme Value
Digital
transformationAgility
Adaptive
Automation
Select and manage the
right cloud path for you
Manageable, secure
DevOps delivered at scale
Recognize and respond to
dynamic environments
Flexibility
Cognitive
Data Scientist
learns,
decides,
improves
AI to speed problem determination
© 2018 IBM Corporation
Agenda
Part 1: - Cloud Service Management
Part 2: - Machine Learning is essential (for adaptive automation)
Part 3: - Wrap-up, Call to action & Q&A
Adaptive automation
of IT professionals agree:
we will be overwhelmed without
automation.
70%
Proactive
Predictive insights
Adaptive
automation
Reactive
Real-time analytics
Adaptive
Cognitively
enhanced workflow
Scale
Complexity
Recognize and respond
to dynamic environments
Adaptive
Automation
Recognize and respond
to dynamic
environments
Insights to increase efficiency
• Automated noise reduction
• Automation of complex tasks
Insights to Avoid Outages
• Automatically detect
behavioural changes
• Take action, before users are
impacted
Insights to reduce MTTR
• Probable cause identification
• Context, in dynamic
environments
13
Adaptive
Automation
Machine learning, advanced analytics and cognitive
technologies delivering automated value for Centralized
IT Operations and DevOps teams
Insights to increase efficiencyInsights to Avoid Outages
Insights from your Terabytes of
Operational Data
Machine Learning applied automatically
to your performance data
Automate, automate, automate with Machine
Learning applied your event and performance
data. Extend with Watson.
Reactive
Real-time analytics
Proactive
Predictive Insights
Adaptive
Cognitively Enhanced
Workflows
Insights to reduce MTTR
"“Right there - visually - we saw
proof that you can use machine
learning to be able to identify
root cause….. Everyone sat
there in silence for three
minutes.”
David Nestic
Technical operations manager, NBN
Source:
"After testing the cognitive
monitoring solution (IBM Operations
Analytics Predictive Insights) ..we
saw a significant reduction in server
incidents..Thanks to it we will have a
platform that can help us act before
an incident occurs”
Jan Steen Olsen
Executive Vice President and CTO, Danske
Bank
Source:
“We live on the edge of control, trying
to assure our systems and deal with
ever-changing business and user
requirements. To control costs, we
need to keep operations lean by
processing only actionable
alarms”…….On average, we reduced
15% of the “noise” alarms.”
Operations Leader, Fast Growing Canadian
Telco
Correlated Event Groups
Traditional Events
Cisco ACI
Docker
Kubernetes
OpenStack
TADDM
NOI
VMware
vCenter
ITNM
IBM ALM
DNS
REST
Netcool
Ops
Insight
- Event Clustering
- Seasonal Analysis and Suppression
- Weighted probable cause
Machine Learning for Reactive Management
Cisco ACI
Cognitive
Event MoM
Collaboration
& Automation
ChatOps
Notification
Run Books
Correlated Event Groups
Traditional Events
Proactive Events
Metrics
Cisco ACI
Docker
Kubernetes
OpenStack
TADDM
NOI
VMware
vCenter
ITNM
IBM ALM
DNS
REST
Predictive
Insights
Netcool
Ops
Insight
- AI driven Model selection
- Variance Analysis
- Dependency Determination
- Dynamic Threshold
- Event Clustering
- Seasonal Analysis and Suppression
- Weighted probable cause
Machine Learning for Reactive and Proactive Management
Cisco ACI
Cognitive
Performance MoM
Cognitive
Event MoM
Collaboration
& Automation
ChatOps
Notification
Run Books
Advanced Analytics for Rapid Context
17
Agile Service
Manager
Dynamic
Topology MoM
RESULT: Cognitive Manager of Managers across Event, Performance and Topology data
Cognitive
Data Scientist
learns,
decides,
improves
Sophisticated
Seasonal
Modelling
Robust
Statistical
approaches
(independent of data
distribution)
Multiple
Anomaly
Detection
Algorithms
Automatic
Model
Validation
Long term
learning
(monthly/
yearly patterns)
Mathematical
Relationship
Discovery
Rapid analysis
of highly
dynamic
environments
Automated
Runbooks
User
Domain
knowledge
Alert Mgmt &
Collaboration
Probable
Cause
Identification
Context, in
highly Dynamic
Environments
Automated
Remediation
Mean-Time-To-Identify
(MTTI)
Mean-Time-to-Know
(MTTK)
Automated Event
Suppression &
Incident Correlation
Automated
Early
Detection
Mean-Time-to-Fix
and Verify
Adaptive Automation
Incident Management Example
© 2018 IBM Corporation
Agenda
Part 1: - Cloud Service Management
Part 2: - Machine Learning is essential (for adaptive automation)
Part 3: - Wrap-up, Call to action & Q&A
Patterns of behavior w/
Machine Learning
Seasonality of environment
behavior
Abnormal behaviors that
precursor events
Predict to Get Ahead Augment the Process
Cognitive Automated Ticket Creation
and Routing
Cognitive Process Automation with
robotics and Watson guided advise
Cognitive Process Automation for
zero-touch automation with robotics
and Watson embedded advise and
next steps
Simplify & Focus
Pattern Analysis to Correlate &
De-duplicate events
Pattern Analysis for IT Operations
Cognitive Network 360* Insights
Real Time Federated Topology
Augment Staff
Cognitive Incident Advisor
Cognitive Agent Assist
Cognitive Knowledgebase w/
semantic search
Cognitive Assistant for Change
§Netcool Operations Insight
§Agile Service Manager
§Hadoop HDFS
§Watson Data Platform (DSX)
§Watson Explorer
§Watson Discovery
§Watson Knowledge Studio
§IBM Operations Analytics –
Predictive Insights
§Netcool Operations Insight
§RPA tools
§Watson Explorer Semantic
Analysis
§Dynamic Automation
§PASIR
§Watson Discovery
§Watson Knowledge Studio
§Watson Assistant
§Watson Conversation Services
§Watson Explorer
§Watson Discovery
§Watson Knowledge Studio
§Speech To Text / Text To Speech
§Watson for Cyber Security
§Qradar Watson Advisor
CapabilityProducts/CloudServices
Adaptive Automation
Machine Learning and leveraging user experience
Predict to Get Ahead Augment the ProcessSimplify & Focus Augment Staff
Adaptive Automation
Call to action and Q&A
Short Videos Predictive Capabilities
§ The Value Video
§ The Capability video
IBM Marketplace:
§ Operations Analytics
§ Netcool Operations Insight
§ Application Performance Management
Forrester Total Economic Studies
§ The Operations Management TEI
§ The Application Management TEI
IT Operations Maturity Assessment
§ Questionnaire to get you thinking
Find Out More
© 2018 IBM Corporation
Thank you
Key Capabilities: Reduce MTTR
With 2nd Gen Advanced Real-Time Event,
Performance and Topology Analytics
• Groups events that always occur together, providing increased
context for faster resolution
• Learns complex relationships across your applications and
infrastructure and provides insights for potential root cause
• Rapidly analyses multiple sources of topology to provide up-to-
date service and topology views for context
With 1st Gen capabilities for Rapid Problem
Resolution
• Big data search across all operational data, supplemented text
derived insights and log monitoring
Insights to reduce MTTR
• Probable cause identification
• Context, in dynamic environments
Reactive
Real-time analytics
Insights from your Terabytes of Operational Data
Key Capabilities: Avoid Outages
Utilise IBM’s advanced machine learning to
proactive manage your critical application and
infrastructure
Solution automatically detects behavioural
changes and provides insights to help root
cause
Operations can take corrective action, before
critical services and users are impacted
Solution has been successfully deployed and
has dramatically reduced outages
Insights to Avoid Outages
• Automatically detect behavioural
changes
• Take action, before users are
impacted
Proactive
Predictive insights
Machine Learning applied automatically to your
performance data
2nd Gen ITOA
Key Capabilities: Increase Efficiency
Due to advances, Machine Learning can now
automate many human decisions AND it scales
and adapts
– Reduces alert noise due to advances seasonal
behaviour analysis
– Reduces manual effort by utilising machine learning to
set and maintain thresholds
– Reduce tickets and manual effort, by automatically
grouping events that always occur together
Automatically analyses patterns in operational data
to identify waste and automation opportunities
Insights to Increase Efficiency
• Automated noise reduction
• Automation of complex tasks
Increase Efficiency
Automate, automate, automate with Machine
Learning applied your event and performance
data
1st and 2nd Gen ITOA

Cloud Service Management: Why Machine Learning is Now Essential

  • 1.
    © 2018 IBMCorporation Cloud Service Management and why Machine Learning is now essential June 21 2018
  • 2.
    © 2018 IBMCorporation Agenda Part 1: - Cloud Service Management Part 2: - Machine Learning is essential (for adaptive automation) Part 3: - Wrap-up, Call to action & Q&A
  • 3.
    Cloud enables digital transformation Totransform, organizations are employing App Modernization, Hybrid, and DevOps Supporting agility at scale requires managing increasing data growth, complexity, and dynamic environments
  • 4.
    Deliver, reliable, competitiveapplications Fast Business Reality & needs: • Agile Application Delivery • End user experience & reliability • Lean Operations Management Ops Goal: Fewer problem tickets, faster resolution Dev Goal: Faster time to market, reduce disruptions Dev Test Stage Prod Sto p Shift Right Shift Left virtually every application & service will incorporate AI, Gartner10yrs of the top 20 companies in every industry will be disrupted in the next 3 years1/3 of apps must be refactored to move to cloud99% LoB Executive Application Owner Application Developer Chief Information Officer IT Operations Manager IT Operations Engineer
  • 5.
    Business Imperatives areDriving Faster Change Agility depends on DevOps practices and Cloud-Enabled Process Innovation Systems of Record Operational Excellence Systems of Engagement Transformation & Differentiation Agile Management Traditional Management Traditional Model Agile Model Some, big IT projects Many, small 2-3 years Time to go live 2-3 months Lower Change rate Higher Centralized Governance Decentralized Cloud-ready, on-prem Tools Cloud-Native ITIL, CMMI Processes DevOps, Lean Hybrid Ops Hybrid Apps Source: The agile CIO: Mastering digital disruption. http://blog.kpmg.ch/the-agile-cio-mastering-digital-disruption/ 5
  • 6.
    Process, Tools and Culture Growing anAgile organization requires adaptation across the organization . Process Tools & Technology Culture • Adjust processes to enable Agility • Continued High Availability and Performance • Built-to-Manage Approach • Integrate Cloud Service Management toolchain with existing ITSM capabilities • Implement New Tools (ChatOps, Runbook Automation, etc.) • Orient on Application Agility and shared success (DevOps) • Transition to New Roles (i.e. Site Reliability Engineer, First Responder) • Transition to Proactive monitoring (Analytics) 6
  • 7.
    Enterprise DevOps Adoption “TheFuture is already here, it is just unevenly distributed” – William Gibson 7 New DevOps Startup • Full Stack Engineers • Highly Collaborative • Informal and Agile • Focused and Independent Enterprise Business Reality: • Some Agile Applications • Some Legacy Applications • Adopting Cloud Operating Model • Mix of Traditional and Cloud IT Service Management (ITIL) • Process Oriented • Resistant to Change Cloud Service Management • Service Oriented • Dynamic and Agile L1 Ops L2 Ops SME Site Reliability Engineer First Responder DevOps/SME
  • 8.
    Hybrid Cloud Managementenables the transformation journey Theme Value Digital transformationAgility Adaptive Automation Select and manage the right cloud path for you Manageable, secure DevOps delivered at scale Recognize and respond to dynamic environments Flexibility
  • 9.
  • 10.
    © 2018 IBMCorporation Agenda Part 1: - Cloud Service Management Part 2: - Machine Learning is essential (for adaptive automation) Part 3: - Wrap-up, Call to action & Q&A
  • 11.
    Adaptive automation of ITprofessionals agree: we will be overwhelmed without automation. 70%
  • 12.
    Proactive Predictive insights Adaptive automation Reactive Real-time analytics Adaptive Cognitively enhancedworkflow Scale Complexity Recognize and respond to dynamic environments Adaptive Automation Recognize and respond to dynamic environments
  • 13.
    Insights to increaseefficiency • Automated noise reduction • Automation of complex tasks Insights to Avoid Outages • Automatically detect behavioural changes • Take action, before users are impacted Insights to reduce MTTR • Probable cause identification • Context, in dynamic environments 13 Adaptive Automation Machine learning, advanced analytics and cognitive technologies delivering automated value for Centralized IT Operations and DevOps teams
  • 14.
    Insights to increaseefficiencyInsights to Avoid Outages Insights from your Terabytes of Operational Data Machine Learning applied automatically to your performance data Automate, automate, automate with Machine Learning applied your event and performance data. Extend with Watson. Reactive Real-time analytics Proactive Predictive Insights Adaptive Cognitively Enhanced Workflows Insights to reduce MTTR "“Right there - visually - we saw proof that you can use machine learning to be able to identify root cause….. Everyone sat there in silence for three minutes.” David Nestic Technical operations manager, NBN Source: "After testing the cognitive monitoring solution (IBM Operations Analytics Predictive Insights) ..we saw a significant reduction in server incidents..Thanks to it we will have a platform that can help us act before an incident occurs” Jan Steen Olsen Executive Vice President and CTO, Danske Bank Source: “We live on the edge of control, trying to assure our systems and deal with ever-changing business and user requirements. To control costs, we need to keep operations lean by processing only actionable alarms”…….On average, we reduced 15% of the “noise” alarms.” Operations Leader, Fast Growing Canadian Telco
  • 15.
    Correlated Event Groups TraditionalEvents Cisco ACI Docker Kubernetes OpenStack TADDM NOI VMware vCenter ITNM IBM ALM DNS REST Netcool Ops Insight - Event Clustering - Seasonal Analysis and Suppression - Weighted probable cause Machine Learning for Reactive Management Cisco ACI Cognitive Event MoM Collaboration & Automation ChatOps Notification Run Books
  • 16.
    Correlated Event Groups TraditionalEvents Proactive Events Metrics Cisco ACI Docker Kubernetes OpenStack TADDM NOI VMware vCenter ITNM IBM ALM DNS REST Predictive Insights Netcool Ops Insight - AI driven Model selection - Variance Analysis - Dependency Determination - Dynamic Threshold - Event Clustering - Seasonal Analysis and Suppression - Weighted probable cause Machine Learning for Reactive and Proactive Management Cisco ACI Cognitive Performance MoM Cognitive Event MoM Collaboration & Automation ChatOps Notification Run Books
  • 17.
    Advanced Analytics forRapid Context 17 Agile Service Manager Dynamic Topology MoM
  • 18.
    RESULT: Cognitive Managerof Managers across Event, Performance and Topology data Cognitive Data Scientist learns, decides, improves Sophisticated Seasonal Modelling Robust Statistical approaches (independent of data distribution) Multiple Anomaly Detection Algorithms Automatic Model Validation Long term learning (monthly/ yearly patterns) Mathematical Relationship Discovery Rapid analysis of highly dynamic environments Automated Runbooks User Domain knowledge Alert Mgmt & Collaboration Probable Cause Identification Context, in highly Dynamic Environments Automated Remediation Mean-Time-To-Identify (MTTI) Mean-Time-to-Know (MTTK) Automated Event Suppression & Incident Correlation Automated Early Detection Mean-Time-to-Fix and Verify Adaptive Automation Incident Management Example
  • 19.
    © 2018 IBMCorporation Agenda Part 1: - Cloud Service Management Part 2: - Machine Learning is essential (for adaptive automation) Part 3: - Wrap-up, Call to action & Q&A
  • 20.
    Patterns of behaviorw/ Machine Learning Seasonality of environment behavior Abnormal behaviors that precursor events Predict to Get Ahead Augment the Process Cognitive Automated Ticket Creation and Routing Cognitive Process Automation with robotics and Watson guided advise Cognitive Process Automation for zero-touch automation with robotics and Watson embedded advise and next steps Simplify & Focus Pattern Analysis to Correlate & De-duplicate events Pattern Analysis for IT Operations Cognitive Network 360* Insights Real Time Federated Topology Augment Staff Cognitive Incident Advisor Cognitive Agent Assist Cognitive Knowledgebase w/ semantic search Cognitive Assistant for Change §Netcool Operations Insight §Agile Service Manager §Hadoop HDFS §Watson Data Platform (DSX) §Watson Explorer §Watson Discovery §Watson Knowledge Studio §IBM Operations Analytics – Predictive Insights §Netcool Operations Insight §RPA tools §Watson Explorer Semantic Analysis §Dynamic Automation §PASIR §Watson Discovery §Watson Knowledge Studio §Watson Assistant §Watson Conversation Services §Watson Explorer §Watson Discovery §Watson Knowledge Studio §Speech To Text / Text To Speech §Watson for Cyber Security §Qradar Watson Advisor CapabilityProducts/CloudServices Adaptive Automation Machine Learning and leveraging user experience
  • 21.
    Predict to GetAhead Augment the ProcessSimplify & Focus Augment Staff Adaptive Automation Call to action and Q&A Short Videos Predictive Capabilities § The Value Video § The Capability video IBM Marketplace: § Operations Analytics § Netcool Operations Insight § Application Performance Management Forrester Total Economic Studies § The Operations Management TEI § The Application Management TEI IT Operations Maturity Assessment § Questionnaire to get you thinking Find Out More
  • 22.
    © 2018 IBMCorporation Thank you
  • 23.
    Key Capabilities: ReduceMTTR With 2nd Gen Advanced Real-Time Event, Performance and Topology Analytics • Groups events that always occur together, providing increased context for faster resolution • Learns complex relationships across your applications and infrastructure and provides insights for potential root cause • Rapidly analyses multiple sources of topology to provide up-to- date service and topology views for context With 1st Gen capabilities for Rapid Problem Resolution • Big data search across all operational data, supplemented text derived insights and log monitoring Insights to reduce MTTR • Probable cause identification • Context, in dynamic environments Reactive Real-time analytics Insights from your Terabytes of Operational Data
  • 24.
    Key Capabilities: AvoidOutages Utilise IBM’s advanced machine learning to proactive manage your critical application and infrastructure Solution automatically detects behavioural changes and provides insights to help root cause Operations can take corrective action, before critical services and users are impacted Solution has been successfully deployed and has dramatically reduced outages Insights to Avoid Outages • Automatically detect behavioural changes • Take action, before users are impacted Proactive Predictive insights Machine Learning applied automatically to your performance data 2nd Gen ITOA
  • 25.
    Key Capabilities: IncreaseEfficiency Due to advances, Machine Learning can now automate many human decisions AND it scales and adapts – Reduces alert noise due to advances seasonal behaviour analysis – Reduces manual effort by utilising machine learning to set and maintain thresholds – Reduce tickets and manual effort, by automatically grouping events that always occur together Automatically analyses patterns in operational data to identify waste and automation opportunities Insights to Increase Efficiency • Automated noise reduction • Automation of complex tasks Increase Efficiency Automate, automate, automate with Machine Learning applied your event and performance data 1st and 2nd Gen ITOA