Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How can AI and automation help create self-driving data centers?

99 views

Published on

Innovation creates exciting new opportunities to boost performance and efficiency, AIOps with AI and automation enabling new data center operations, auto-remediation, adaptive self-healing, and pattern recognition for example. In this session, we will explain how these new technologies can help realize the vision of the Self-Driving Data Center.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

How can AI and automation help create self-driving data centers?

  1. 1. How can AI and automation help create self-driving data centers? Wilfried Cleres Fujitsu Distinguished Engineer Managing IT Consultant for EPS Automation, AI and Cloud Services Twitter: @WilfriedCleres E-Mail: Wilfried.Cleres@ts.fujitsu.com
  2. 2. 3 © 2019 FUJITSU ◼ Overview Data Center Management and Automation (DCMA)
  3. 3. 4 © 2019 FUJITSU Why Data Center Management and Automation? Customer needs ◼ Improve efficiency, agility, flexibility and speed ◼ Smooth and failsafe operations – Always On ◼ Easy Data Center management and automation ◼ Establish Service Quality Management to support the business processes best ◼ Use artificial intelligence and automation solutions for IT operations (AIOps) to reduce cost, complexity and risk of operations ◼ Compliance and Governance Data Center Operation: Efficient, flexible and reliable
  4. 4. 5 © 2019 FUJITSU Fujitsu EPS strategy Knowledge Integration CONTINUITY INNOVATION Customer IoT Cloud Security Big Data AI Automation We make sure that computing technology supports digital transformation comprehensively
  5. 5. 6 © 2019 FUJITSU Fujitsu DCMA Portfolio Service Application Mainframe, Server and Storage Network Power and Cooling Space and Inventory Facilities IT Operations Management (ITOM) Data Center Infrastructure Management (DCIM) Cloud AIOps(AIandAutomation) FUJITSU DCMA AIOps, ITOM and DCIM − comprehensive and integrated
  6. 6. 7 © 2019 FUJITSU ◼ How can you manage power, cooling and assets and automate the processes in your data center? Data Center Infrastructure Management
  7. 7. 8 © 2019 FUJITSU Why are DCIM solutions required? ◼ Environment and resource usage ◼ Energy as a major cost driver in the Data Center ◼ Failure safety and emergency management ◼ Legal requirements ◼ Audits and certifications – e.g. „Blauer Engel“ Lowering the energy consumption: Cost savings and environmental protection
  8. 8. 9 © 2019 FUJITSU DCIM for modeling the digital twin of a Data Center Manage Data Center resources to improve efficiency, reduce cost and risk Optimize DC infrastructure (Space, power and cooling) Manage physical capacity and inventory Real-time visualization of data center utilization Relate data center resources to business value Strategic business value: Increase efficiency, improve capacity and reduce costs
  9. 9. 10 © 2019 FUJITSU Combining Processes, Assets and Energy with DCIM IT Management & Business Services Make Business-Relevant Power & Cooling Monitor Capacity & Inventory Manage Real - time Data Collection Energy Metering Analysis Reporting Alerting Control Physical Lifecycle 3D Visualization Asset Management Capacity Analysis Planning Logical Service Mgmt. Resource Mgmt. Process Automation Virtualization Change Mgmt. Business Integration Automated Auto-discovered Workflow-enabled Cross-platform Cross-vendor
  10. 10. 11 © 2019 FUJITSU ◼ Process Automation ◼ Automated Contingency Management ◼ Categories and positioning of IT Automation AIOps and ITOM − Aligning IT with Business
  11. 11. 12 © 2019 FUJITSU Process Automation Fundamental Characteristics ◼ Integrating people, processes and technology ◼ Technical implementation of business and operational processes ◼ Delivery of IT services and maximizing IT operation efficiencies across IT departments and IT tool sets throughout the enterprise Resulting benefits ◼ Reduce operation expenses (OPEX) ◼ Increase staff productivity ◼ Development of new businesses ◼ More consistent, error-free and auditable operations ◼ Auditable alignment with compliance requirements Process Automation to control, and automate operational processes across the entire Data Center
  12. 12. 13 © 2019 FUJITSU Modeling and digitalization of business processes Customer example: IT.Niedersachsen Digitalization of Business Processes ◼ Close cooperation between the customer teams, CA and Fujitsu ◼ Business Process Modeling with Adonis ◼ Digitized implementation with CA Process Automation ◼ End user acceptance, regards ease of use, speed and flexibility Customer Example BPMN2 Models Subject matter experts of the customer ◼ Definition of the business requirements ◼ Definition of the process model ◼ Check feasibility of the process model ◼ For synergy and standardization reasons publish the BPMN2 processes in a central library Business Process Modeling Digitalization of Business Process Digital Transformation IT experts digitize the business models ◼ Implement the BPMN2 process model into a technical model with process automation ◼ Detailed testing of the technical processes ◼ Release the technical process with the business owners and subject matter experts ◼ For synergy and standardization reasons publish the technical processes in a central library From modeling of Business Processes to automation
  13. 13. 14 © 2019 FUJITSU FUJITSU DCMA Process Automation Solution Why “Automated contingency management”? ◼ Restart of business critical IT-Services ◼ Assure a reliable emergency operation ◼ Reestablish dropped out IT-Services ◼ Support emergency management ◼ Support regular emergency drills Operate the Data Center with maximized reliability, agility and efficiency
  14. 14. 15 © 2019 FUJITSU Automated contingency management Example: Downtime cooling machines "Emergency card 010“ Downtime cooling machines Sensor systems People Building technology Sensor systems IT infrastructure/operations Sensor systems People Downtime cooling machines Emergency card 010 Emergency card 010 People Fujitsu DCMA methodology and solutions in order to implement the contingency manual automation efficiently
  15. 15. 16 © 2019 FUJITSU Fujitsu automated contingency management (CM) Example: BS2000 Emergency Management for a system and storage outage Significantly faster system restart and no manual errors in the restart process, reduction of manual operations tasks Increased availability of business-critical systems Seamless integration across BS2000, Linux and Windows CM with Process Automation ◼ Automatic system shutdown ◼ Automated restart by connecting the BS2000 console CM without Process Automation ◼ After human reaction time − manual system shutdown ◼ Restart in dialog modus with customer-specific configuration – requires deep knowledge about detailed customer system configuration, proceed checklists and manuals, a time consuming and human error prone task Long duration of system restart and increased error rate in manual processes Longer outage of business-critical systems
  16. 16. 17 © 2019 FUJITSU Automated contingency management Customer Benefits ◼ Adhere to escalation procedures and document all activities ◼ Ensure and accelerate people/machine interaction for all operating systems and departments ◼ Operate the data center more simply, more securely and efficiently and at a low price ◼ Reduce downtime costs significantly ◼ Fast recovery of business continuity ◼ Connect IT processes, systems, sensor systems, building technology and people ◼ Initiate emergency processes automatically, automatic diagnostics and self-help ◼ Assist the emergency managers ◼ Avoid manual errors via automation and increase quality Rely on a secure and automated contingency management
  17. 17. 18 © 2019 FUJITSU Categories and positioning of IT Automation Concentrated ExpandedIntegration broadness Service Orchestration IT ServiceFunction/Job Oriented Functional Services Service Delivery Function Specific Content power Point / Function targeted Solutions Automic (UC4) SAG ESM Build in Tools Microsoft, ServiceNow, VMware, … Heuristic Tools SaltStack, … Suite Automation Tools and Solutions CA Process Automation, … Coverage of services Automationlevel
  18. 18. 19 © 2019 FUJITSU ◼ Why use AIOps solutions now? ◼ How can AI and Automation help create self driving data centers? ◼ Using AI to identify emerging problems before they occur ◼ Combining AI and automation technology to enable self healing functionality “Self-Driving Data Center”
  19. 19. 20 © 2019 FUJITSU Why AIOps Now? By 2022, 40% of large enterprises will use artificial intelligence for IT operations (AIOps). “ https://dzone.com/articles/how-can-aiops-support-the-digital-transformation-o Modern architectures introduce complexity AIOps Big Data ML Artificial Intelligence (AI) Point tools increase alarm fatigue Automation is required to reducing cost, complexity and risk of operations. Automation is the Backbone of AIOps Auto Remediation Adaptive Self-Healing
  20. 20. 21 © 2019 FUJITSU Fujitsu Human Centric AI Zinrai ◼ Fujitsu Human Centric AI Zinrai is the answer from Fujitsu to support customers in the area of Artificial Intelligence (AI) ◼ Zinrai takes a Human Centric, solutions driven approach to co-create valuable offerings for our customers using the best of breed technologies from across the globe, developed and deployed to meet ever-growing customer challenges ◼ Combining the strength of Zinrai AI development in Japan and the rest of the world with carefully selected partner capabilities, Fujitsu delivers the optimal AI supported solutions to our customers’ challenges. Fujitsu Zinrai is a full set of AI Solutions and Services
  21. 21. 22 © 2019 FUJITSU Proven Use Cases for AI in EMEIA AI Quality Control - Making it Right Applying Zinrai to Wind Turbine Manufacture AI Customer Flow Analysis - Making the Invisible Visible Applying Zinrai to City Infrastructure, Security, … AI Predictive Maintenance − Keeping the Lights On Applying Zinrai to Data Centers and other critical infrastructures
  22. 22. 23 © 2019 FUJITSU AIOps − Artificial Intelligence for IT Operations ◼ Artificial Intelligence (AI) is one of the emerging technologies and will boost the digitization of business processes. ◼ Major upheaval also in the area of IT Operations ◼ With DevOps already automation of deployment ◼ Due to the complexity of dynamic infrastructures and immensely data-generating IT processes, artificial intelligence is needed to be able to proactively act. AIOps frees IT from a rules-based approach to systems management
  23. 23. 24 © 2019 FUJITSU Machine learning proactively detects anomalies t unlikely normal rare still acceptable Artificial Intelligence (AI) == Künstliche Intelligenz (KI) Basics ◼ Application of machine learning to historical data or real-time streams ◼ The ranges statistically defined by AI: normal + still acceptable + rare + unlikely based on % values ◼ Alerts are not generated by single events, but only after a series of exceedances using Western Electric rules (decision rules in statistical processes). ◼ Thresholds for alerts are set flexibly by the AI as a function of time (t) and can change during the learning process. ◼ Alert patterns are recognized and generate messages for situations outside normal behavior ◼ Since AIOps is a mathematical machine learning approach, a single algorithm can replace the logic of thousands of rules! Anomaly typical volatility
  24. 24. 25 © 2019 FUJITSU Rule 1: Any point outside Zone A Western Electric Rules ◼ Origin of the Western Electric Rules − 1956 Decision rules in statistical process control for detecting out-of-control or non-random conditions ◼ Motivation: Being able to distinguish natural from unnatural patterns ◼ Zone rules: To detect process instability and the presence of assignable causes. Source: WIKIPEDIA April 2019 https://en.wikipedia.org/wiki/Western_Electric_rules Rule 4: Nine consecutive points on the same side of the midline (mean) Each individual data point falls outside the 3σ boundary from the centerline (i.e. each point outside zone A falls outside either the upper or lower control boundary). Rule 2: Two out of three consecutive points fall in zone A or beyond Two out of three consecutive points cross the 2σ boundary (in zone A or beyond), on the same side of the midline. Rule 3: Four out of five consecutive points fall in Zone B or beyond. Four out of five consecutive points cross the 1σ boundary (in zone B or beyond), on the same side of the centerline. Nine consecutive points fall on the same side of the center line (in zone C or beyond).
  25. 25. 26 © 2019 FUJITSU Practical examples AI and automation Automatic troubleshooting and today's possibilities of adaptive self-healing Server Mainframe Network Applications Database Storage ◼ AI − Operational Intelligence ◼ Automation is the Backbone of AIOps
  26. 26. 27 © 2019 FUJITSU BS2000 and AIOps ◼ Concrete example with BS2000 ◼ Performance metrics from openSM2(openSM2 has over 200 report types!) ◼ Connector via SNMP in BS2000(net-snmp & snmp-agents) ◼ Data Ingestion via REST API on Management Unit (MU) to AIOps System on Application Unit (AU) ◼ CA OI is an example for an AIOps system ◼ Integration of BS2000 in AIOps through data connectors
  27. 27. 28 © 2019 FUJITSU Algorithms for alert noise reduction ◼ Sudden Spikes ◼ Do not alert a single spike when the metric returns to normal during the next intervals ◼ Continuous volatility ◼ Do not create multiple alerts when only the severity is changing ◼ Combine all events into a single alert with the highest severity detected. ◼ „Minimum Threshold“ ◼ Do not alert when a defines value for a metric or a group of metrics is not exceeded ◼ Metrics Hierarchy ◼ There are metrics with a certain hierarchy. Do not alert a metric at a lower level when a metric at a higher level is alerting
  28. 28. 29 © 2019 FUJITSU Root Cause Analysis with Topology Hotspot ◼ Detect Correlations between subsystems ◼ Network ◼ Mainframe, Server, Storage, Operating System ◼ Databases ◼ Transaktion system ◼ Message queuing system ◼ Others, … ◼ Determine which subsystems have issues ◼ Green − All metrics in the subsystem are ok ◼ Yellow − Metrics with warnings ◼ Red − Metrics with critical situations ◼ Timeline ◼ Visualized the situation where the states have changed At a glance root cause analysis of problem areas
  29. 29. 30 © 2019 FUJITSU AI based adaptive Self-Healing ◼ Self healing starts with defining automated action for specific problem situations ◼ Problem solving processed can be documented manually and can be retrieved easily when situations reoccur. ◼ In a next step Machine Learning tries to correlate information from logs to determine which actions have helped to save the problem ◼ When the system recognizes that there has been a way to solve a similar issue in the past it can suggest the same approach for the actual situation. This has to confirmed 3 to 5 times and then this can be seen as a proven solution ◼ In combination with prediction of possible alert situations it can be possible to automatically initiate these learned actions to avoid the future problem ◼ This is a permanent process with a growing number of automated problem solving processes over time Drive Action Adaptive Self-Healing Auto Remediation
  30. 30. 31 © 2019 FUJITSU Intelligent automation with recommendations Continuous Learning AI creates the green highway to identify anomalies Learn normal behavior Issue solved? ANOMALY identified! No Yes Separate the signal from noise and find the needle in the haystack to act on Actionable Insights Do I have enough confidence to recommend a course of action? Automated Workflows Did it bring down the anomaly? Effectiveness
  31. 31. 32 © 2019 FUJITSU Combining AI and Automation to Adaptive Self-Healing Shows the related workflow in Automic Automation Fetches related job names from Automic Automation computes and displays confidence levels
  32. 32. 33 © 2019 FUJITSU Continuous AIOps Real-time context-based decision making and intelligent automation Applications Databases Storage Server Mainframe Networks Data, Sentiment Processing Gather Data Derive Insigths Apply Context Drive Action Reduce Alert Noise Identify Root Cause Predict Issue Application data Network data Storage data Database data Infrastructure data End-to-End IT Service Visibility Topology Hotspot Mapping Adaptive Self-Healing Auto Remediation Continuously Optimize AI Algorithms Intelligent Automation Core tasks: Collect data, learn normal behavior, recognize patterns, detect anomalies, and initiate appropriate actions Increase Data Center Efficiency with AI and Intelligent Automation
  33. 33. 34 © 2019 FUJITSU ◼ The Future Data Center - AIOps Summary
  34. 34. 35 © 2019 FUJITSU The Future Data Center − AIOps ◼ Rearchitect the Data Center, shift IT resources where they are needed ◼ Software defined and virtual anything ◼ AIOps and adaptive Self-Healing ◼ Autonomous orchestration of IT resources pools will be the strategic business driver ◼ Standardization, business and operational process redesign will be required ◼ Automation is the backbone of AIOps ◼ Data Center admin skills will change from setting up real IT devices to modeling of processes and to train AI solutions In the next 5 years we will see the highest rate of change in efficiency with AIOps
  35. 35. 36 © 2019 FUJITSU Fujitsu Data Center Management and Automation Thank you for listening! Contact: Wilfried Cleres E-Mail: wilfried.cleres@ts.fujitsu.com
  36. 36. 37 © 2019 FUJITSU

×