Bright talk   running a cloud - final
Upcoming SlideShare
Loading in...5
×
 

Bright talk running a cloud - final

on

  • 262 views

 

Statistics

Views

Total Views
262
Views on SlideShare
262
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Bright talk   running a cloud - final Bright talk running a cloud - final Presentation Transcript

  • http://img11.imageshack.us/img11/2017/skatingdownarollercoastw.jpg!Running a Cloud: How theCloud Impacts ServiceManagement and IT Operations!
  • !Mr. White has fifteen years of experience designing and managingthe deployment of systems monitoring and Event Managementsoftware. Prior to joining IBM, Mr. White held various positionsincluding the leader of the Monitoring and Event Managementorganization of a Fortune 100 company and developing solutions asa consultant for a wide variety of organizations, including theMexican Secretaría de Hacienda y Crédito Público, Telmex, Wal-Mart of Mexico, JP Morgan Chase, Nationwide Insurance and theUS Navy Facilities and Engineering Command.!!Andrew White!Cloud and Smarter Infrastructure Solution Specialist!IBM Corporation!
  • http://weheartit.com/entry/12433848!
  • Follow Us: #ITSMSummit!GROUND RULES FOR THIS SESSION…!1.  If you can’t tell if I am trying to be funny…!!GO AHEAD AND LAUGH!!2.  Feel free to text, tweet, yammer, or whatever. Use!3.  If you have a question, no need to wait until theend. Just interrupt me. Seriously… I don’t mind.!
  • I have a lot of experience leading !Systems and Event Management teams !My name is Andrew White!
  • Cloud Operations!I am here today to share some of what I have learned about!
  • More importantly, I am here today to talk abouthow the cloud affects…!
  • QUESTION:!What value does your IT organizationcreate for your business?!
  • If you can’t answer this question, howcan you be sure you are doing theright things and doing them well…!
  • HINT: “We provide infrastructure or applications thebusiness uses” is not a value statement!
  • Follow Us: #ITSMSummit!We are all here for one reason…!
  • How does IT preserve the value it creates?!• 100% Uptime*!• Scalability*!• Performance*!• Agility*!• Good UX*!!*To the best of our ability!
  • How well would THEY say you are doing?!
  • Follow Us: #ITSMSummit!CURRENT MARKET CONDITIONS!§  The velocity of change and the volume of data is increasing!§  Virtualization introduces complexity and increasedconsumption of resources!§  Shared services are forced to oversubscribe finite resources!§  Expertise is limited to functional silos and there is nounderstanding of how the system functions end-to-end!§  Supporting a cloud requires the ability to manage a large-scale dynamic infrastructure!§  Agile development and Continuous Delivery are in conflictwith ITIL processes!
  • We need to recognize when wehave problems to solve!
  • Follow Us: #ITSMSummit!To  solve  problems  quickly,  we  look  for  solu5ons  that  we  can  use  to  define  best  prac5ces  and  develop  processes  to  insert  a  measure  of  control.  THE TRADITIONAL APPROACH!
  • Follow Us: #ITSMSummit!§  Solutions are driven by accepted conventions!§  Best practices are coveted and are usually adoptedwithout understanding how and why they were developed!§  There must always be a right answer!§  No logical analysis is required!§  People are frequently seen as the “root cause”!§  The outcomes are enforced using “re-dos” and punitiveactions (or the looming threat of these things)!THE PROBLEM WITH THIS APPROACH!
  • http://leanhomebuilding.files.wordpress.com/2010/12/standard2.jpg!
  • Follow Us: #ITSMSummit!§  We receive feedback from our business partners that system performance andavailability have been unacceptable for many of our critical businessapplications!§  Our productivity is impacted and we fail to meat delivery timelines!§  IT is not able to measure its impact on the business or the end user experience!§  There is a lack of clear communication during a problem!§  People are “hoarding” data and reports!§  IT lacks the information needed to prioritize performance issues andopportunities based on business need!§  We take a really long time to figure out what is wrong!§  The same old problems keep coming back!§  We never really get to the “true root cause”!HOW DO WE KNOW WE NEED TO CHANGE!
  • Follow Us: #ITSMSummit!Our typical approach towards serviceimprovement is a bit like attempting toput the toothpaste back in the tube!! “Some  problems  are  so  complex  that  you  have  to  be  highly  intelligent  and  well  informed  just  to  be  undecided  about  them.”          -­‐  Laurence  J.  Peter  CONTROL IS AN ILLUSION!
  • Organizations don’t fail because they take the wrongpath, they fail because they can’t imagine a betterpath than the one they are on.!! ! ! ! ! ! ! ! ! ! !-- Marty Neumeier!
  • What is the next step in the evolution?!
  • Is it the infrastructure or the application?!The perennial problem….!
  • Follow Us: #ITSMSummit!DRIVING THE RIGHT KIND OF ACTION!Application!End UserExperience!Gainesville!Transaction 1!Transaction 2!Transaction N!San Antonio!Transaction 1!Transaction 2!Transaction N!Des Moines!Transaction 1!Transaction 2!Transaction N!Columbus!Transaction 1!Transaction 2!Transaction N!Infrastructure!Network!KPI 1!KPI 2!KPI N!Mainframe!KPI 1!KPI 2!KPI N!Storage!KPI 1!KPI 2!KPI N!Linux!KPI 1!KPI 2!KPI N!Middleware!KPI 1!KPI 2!KPI N!Database!KPI 1!KPI 2!KPI N!
  • Follow Us: #ITSMSummit!Application!End UserExperience!Gainesville!Transaction 1!Transaction 2!Transaction N!San Antonio!Transaction 1!Transaction 2!Transaction N!Des Moines!Transaction 1!Transaction 2!Transaction N!Columbus!Transaction 1!Transaction 2!Transaction N!Infrastructure!Network!KPI 1!KPI 2!KPI N!Mainframe!KPI 1!KPI 2!KPI N!Storage!KPI 1!KPI 2!KPI N!Linux!KPI 1!KPI 2!KPI N!Middleware!KPI 1!KPI 2!KPI N!Database!KPI 1!KPI 2!KPI N!DRIVING THE RIGHT KIND OF ACTION!
  • Follow Us: #ITSMSummit!Application!End UserExperience!Gainesville!Transaction 1!Transaction 2!Transaction N!San Antonio!Transaction 1!Transaction 2!Transaction N!Des Moines!Transaction 1!Transaction 2!Transaction N!Columbus!Transaction 1!Transaction 2!Transaction N!Infrastructure!Network!KPI 1!KPI 2!KPI N!Mainframe!KPI 1!KPI 2!KPI N!Storage!KPI 1!KPI 2!KPI N!Linux!KPI 1!KPI 2!KPI N!Middleware!KPI 1!KPI 2!KPI N!Database!KPI 1!KPI 2!KPI N!DRIVING THE RIGHT KIND OF ACTION!
  • Follow Us: #ITSMSummit!Application!End UserExperience!Gainesville!Transaction 1!Transaction 2!Transaction N!San Antonio!Transaction 1!Transaction 2!Transaction N!Des Moines!Transaction 1!Transaction 2!Transaction N!Columbus!Transaction 1!Transaction 2!Transaction N!Infrastructure!Network!KPI 1!KPI 2!KPI N!Mainframe!KPI 1!KPI 2!KPI N!Storage!KPI 1!KPI 2!KPI N!Linux!KPI 1!KPI 2!KPI N!Middleware!KPI 1!KPI 2!KPI N!Database!KPI 1!KPI 2!KPI N!DRIVING THE RIGHT KIND OF ACTION!
  • Follow Us: #ITSMSummit!Application!End UserExperience!Gainesville!Transaction 1!Transaction 2!Transaction N!San Antonio!Transaction 1!Transaction 2!Transaction N!Des Moines!Transaction 1!Transaction 2!Transaction N!Columbus!Transaction 1!Transaction 2!Transaction N!Infrastructure!Network!KPI 1!KPI 2!KPI N!Mainframe!KPI 1!KPI 2!KPI N!Storage!KPI 1!KPI 2!KPI N!Linux!KPI 1!KPI 2!KPI N!Middleware!KPI 1!KPI 2!KPI N!Database!KPI 1!KPI 2!KPI N!DRIVING THE RIGHT KIND OF ACTION!
  • Follow Us: #ITSMSummit!29Who ya gonna call?
  • Is it the infrastructure or the application?!The perennial problem….!
  • Follow Us: #ITSMSummit!CLOUD PAIN POINTS!§  It takes too long to diagnose problems in theapplication and infrastructure!§  Existing management tools are outdated and don’twork at scale!§  Critical information is missed causing outages andpoor user experiences!§  Most problems are managed reactively!Does any of this sound familiar?!
  • Follow Us: #ITSMSummit!DRIVING THE RIGHT KIND OF ACTION!Application!End UserExperience!Gainesville!Transaction 1!Transaction 2!Transaction N!San Antonio!Transaction 1!Transaction 2!Transaction N!Des Moines!Transaction 1!Transaction 2!Transaction N!Columbus!Transaction 1!Transaction 2!Transaction N!Infrastructure!Network!KPI 1!KPI 2!KPI N!Mainframe!KPI 1!KPI 2!KPI N!Storage!KPI 1!KPI 2!KPI N!Linux!KPI 1!KPI 2!KPI N!Middleware!KPI 1!KPI 2!KPI N!Database!KPI 1!KPI 2!KPI N!The Cloud!
  • Follow Us: #ITSMSummit!REQUIREMENTS FOR UNITY OF EFFORT!1. Commandand Control!2. SharedExperience!3. SituationalAwareness!•  Command and control (No Leadership)!•  The team lacks a clear direction!•  Lots of activity, lack of progress!•  Shared Experience (Poor Relationships)!•  Us vs. Them mentality!•  Unhealthy competition!•  Situational Awareness (Poor Communication)!•  Focused on cooperation, not collaboration!•  Blame culture!•  Infrequent or non-existent communication!Symptoms of Missing Elements!
  • Follow Us: #ITSMSummit!TWO TYPES OF DECISION MAKING!§  Programmed Decisions!§  Routine!§  Repetitive!§  Well-Structured!§  Predetermined DecisionRules!§  Non-Programmed Decisions!§  Unique!§  Presence of Risk!§  Presence of Uncertainty!§  Black Swans!
  • Follow Us: #ITSMSummit!BOYD’S OODA “LOOP”!Observation!OutsideInformation!Implicit Guidance & Control!Unfolding InteractionWith Environment!Feedback!Feedback!UnfoldingCircumstances! Cultural!Norms!Cognitive!Abilities!Knowledge !Life Cycle!Prior!Wisdom!New !Information!FeedForward! Decision!(Hypothesis)!FeedForward! Action(Test)!FeedForward!•  Note how observation shapes orientation, shapes decision, shapes action, and in turn is shaped by thefeedback and other phenomena coming into our sensing or observing window.!•  Also note how the entire “loop” (not just orientation) is an ongoing many-sided implicit cross-referencingprocess of projection, empathy, correlation, and rejection.!!From “The Essence of Winning and Losing,” John R. Boyd, January 1996.!Observe! Orient! Decide! Act!
  • Follow Us: #ITSMSummit!Down  Time  Detec5on  Time   Response  Time   Repair  Time   Recovery  Time  Outage  Detec5on  Diagnosis  Repair  Recover  Restore  Observe   Orient   Decide   Act  INCIDENT LIFE CYCLE!
  • Follow Us: #ITSMSummit!ANATOMY OF AN OUTAGE!Corporate!LANs & VPNs!Load Balancer!Firewall!Web!Servers!Message!Queue!zOS!CICS!WAS!Database!WAS!Database!zOS!MQ!DB2!IM01109089: P0 - Affecting Multiple apps!!!!!4!!!!!!!3!!!!!!!1!5:45-ish pm: CICS ABENDSstart flooding the console butnot high enough to ticket!!!!!!!2!6:00-ish pm: MQ flows startare interrupted and arealerting in Flow Diagnostics!6:04pm: Synthetic transactions fail atand 6:14 the Ops Center confirms theissue and creates a P0 Incident!6:54pm: Support teamsinvestigate the interruptedflows and determine it is a“back-end” problem!10:29pm: Support teamsinvestigate MQ and ultimatelyand rule it out and ultimatelydecide to reset CICS to resolvethe issue!!!!!5!
  • Follow Us: #ITSMSummit!hBp://www.ithakabound.com/wp-­‐content/uploads/2010/02/DC-­‐Snow-­‐men-­‐pushing-­‐car.jpg  Why did this happen?!
  • Four Sources of Bad Decisions:!!1. Failure to frame the problem correctly!2. Poor use of evidence!3. Faulty decision making process!4. No feedback for improvement!
  • Follow Us: #ITSMSummit!WHERE THE BREAKDOWN OCCURS!Observe! Orient! Decide! Act!Situational Awareness!Perception ofElements inCurrent Situation!!Level 1!Comprehensionof CurrentSituation!!Level 2!Projection ofFuture Status!!!Level 3!Decision!Performanceof Actions!CurrentState!Feedback!• Goals & Objectives!• Preconceptions!• Expectations!• Abilities!• Experience!• Training!Long TermMemory!Automaticity!Cognitive Processes!• System Capability!• Interface Design!• Stress & Workload!• Complexity!• Automation!Adapted from Endsley, M.R. (1995b). Toward a theory of situation awarenessin dynamic systems. Human Factors 37(1), 32–64.!Systemic Influences!Individual Influences!
  • Follow Us: #ITSMSummit!SOMETIMES WE MISS WHAT IS GOING ON!Say… what’s amountain goat doing allthe way up here in acloud bank?!
  • Follow Us: #ITSMSummit!NORMATIVE DECISION MAKING MODEL!§  Limited Information Collection!§  7 +/- 2!§  Tendency to acquire manageable rather than optimal amountsof information!§  Difficulty identifying all possible options!§  Judgmental Heuristics!§  Judgmental heuristics - rules of thumb or shortcuts that peopleuse to reduce information processing demands!§  Availability heuristic - tendency to base decisions oninformation readily available in memory!§  Representativeness heuristic - tendency to assess thelikelihood of an event occurring based on impressions aboutsimilar occurrences!§  Satisficing!§  Choosing a solution that meets a minimum standard ofacceptance!
  • 1. Adapted from Endsley, M.R. (1995b). Toward a theory of situation awareness in dynamic systems.Human Factors 37(1), 32–64.!!Our systems are capable of producing a hugeamount of data, both on the status of their owncomponents and on the status of theenvironment. The problem with today’s systemsis not a lack of information, but finding what isneeded when it is needed.!
  • Follow Us: #ITSMSummit!
  • Why does any of this matter?!
  • Follow Us: #ITSMSummit!REQUIREMENTS FOR UNITY OF EFFORT!1. Commandand Control!2. SharedExperience!3. SituationalAwareness!•  Command and control (No Leadership)!•  The team lacks a clear direction!•  Lots of activity, lack of progress!•  Shared Experience (Poor Relationships)!•  Us vs. Them mentality!•  Unhealthy competition!•  Situational Awareness (Poor Communication)!•  Focused on cooperation, not collaboration!•  Blame culture!•  Infrequent or non-existent communication!Symptoms of Missing Elements!In the cloud, much of this will be federated or done by software!
  • Follow Us: #ITSMSummit!CLOUD IS ASSISTED DECISION MAKING§  Programmed Decision Making!§  Collect evidence!§  Identify the problem!§  Select a solution!§  Implement and evaluate the outcome!§  Non-Programmed Decision Making!§  Narrow evidence down to the ideal level!§  Apply heuristics to limit the impact of cognitive bias!§  Present options to a human for a decision!
  • Follow Us: #ITSMSummit!DECISIONS BEING AUTOMATED IN THE CLOUD!Packing! •  Compressing workloads to the fewest number of physicalservers!•  Maximizing cost efficiencies!Striping! •  Spreading workloads across as many physical servers aspossible!•  Ensuring higher performance levels and reducing risk due tocomponent failure!Load-Awareness!•  Allocating new workloads to the servers with the lowest load!•  Maximizing the performance of the workloads!HA-Awareness!•  Ensuring workloads are distributed across pods!•  Matching availability levels with service requirements andcost targets!EnergyAwareness!•  Placing workloads according to energy costs!•  Ending workloads to reduce energy consumption orrescheduling them for off-peak hours!Affinity-Awareness!•  Placing workloads close to critical resource dependencies!•  Collocating compatible workloads to maximize availableresources!PlatformAwareness!•  Allocate workloads to best platform!•  Migrating workloads to least expensive platform still capableof delivering required service levels!TopologyAwareness!•  Allocating resources within a service group near each other!•  Isolate single-points-of-failure!
  • Follow Us: #ITSMSummit!CLOUD OPERATION REQUIREMENT!!The perception of and reaction to a set of changingevents in terms of what can be done instead of merelythe recollection of a stimuli.1 !Operating a cloud means enablinggood decision making!1. Adapted from Endsley, M.R. (1995b). Toward a theory of situationawareness in dynamic systems. Human Factors 37(1), 32–64.!
  • Follow Us: #ITSMSummit!When decisions are not made basedon information, it’s called gambling.!
  • Follow Us: #ITSMSummit!SOME THINGS NEVER CHANGE!Corporate!LANs & VPNs!ISP!Connection!DNS & Internet!Services!Content Mgmt!System!Social Network!Widgets!Site Tracking!& Analytics!Banner Ads & !Revenue Generators!Multimedia &!CDN Content!Home Wireless!& Broadband!Mobile Broadband!Is It My Cloud Provider?!•  Configuration errors!•  Application design issues!•  Code defects!•  Insufficient infrastructure!•  Oversubscription Issues!•  Poor routing optimization!•  Low cache hit rate!Is It a Service Provider Problem?!•  Non-optimized mobile content!•  Bad performance under load!•  Blocking content delivery!•  Incorrect geo-targeted content!Is it an ISP Problem?!•  Peering problems!•  ISP Outages! Is it My Code or a Browser Problem?!•  Missing content!•  Poorly performing JavaScript!•  Inconsistent CSS rendering!•  Browser/device incompatibility!•  Page size too big!•  Conflicting HTML tag support!•  Too many objects!•  Content not optimized for device!The Cloud!
  • Follow Us: #ITSMSummit!OUR UNDERSTANDING OF YOUR GOALS!§ Gaining visibility into and control ofan increasingly complex operatingenvironment in order to preventfrequent and prolonged outages!§ Evolving from fault monitoring to aholistic approach to managingapplication performance!§ Increased focus on cloud makesproblem isolation and resolutionmore complex.!PROACTIVE OPERATIONS!§ Optimizing the performance ofbusiness processes to boostproductivity!§ Providing cost transparency totrack, analyze, and manageresources and control the costsassociated with highly-virtualizedand cloud environments!§ Improving software assetmanagement to prevent over-spending and under-licensing!!CONTROL COST!!§ Leveraging automation to facilitaterapid growth and reduce the cost ofservice delivery!§ Maintaining OS and applicationpatch levels across all images(active or dormant) to protect theenterprise and enable compliance!§ Automating application releases tooptimize service delivery and alignthe Development and Operationsteams thereby increasinginnovation, reducing costs, andaccelerating time to value!ELIMINATE HUMAN FACTORS!Migrating to the cloud is disruptive to an IT organization. We have experienced that many ofour clients use this as an opportunity to re-evaluate the way they operate their environmentsand the tools they leverage to deliver a quality service.!We have identified three key goals driving the adoption of the cloud:!
  • OK.!So now what?!
  • Starting the journey…!
  • Follow Us: #ITSMSummit!WHAT THIS MEANS TO US…!There are a few inescapable facts we face:!1.  We needs reliable systems to store the promises itmakes to its customers !2.  Our systems mirror the complexity of thebusinesses they support!3.  Our environments must be massive to scale tohandle the workload!4.  There is too much activity for a single person to betotally situationally aware!5.  If the users can’t use it, it doesn’t work!
  • Follow Us: #ITSMSummit!Monitoring & Capacity! Infrastructure as Code! Orchestration!Backup & Recovery! Continuous Delivery! Storage Virtualization!Cost Management! HA / DR!Patch Mgmt! Dynamic Scheduling!Bare Metal Provisioning!Network Management!Transaction Tracing!App Provisioning! Performance Analytics!App Perf Mgmnt! App Diagnostics! Service Visualization!Monitoring & Capacity! App Perf Mgmt! Event Management!Infrastructure!Optimization!Application !Analytics!Analytics Enabled !Datacenter!Virtualization !Optimization!DevOps!Cloud Enabled !Datacenter!Cloud Optimized!Analytics Empowered!The building blocks on your Journey towards an agile, flexible and optimized environment!ROADMAP TO MATURE CLOUD OPERATIONS!
  • Follow Us: #ITSMSummit!REMEMBER THE OPS USE CASE!•  Security!•  Backups!•  High Availability!•  Upgradability!•  Deployment Process!•  Scaling and Elasticity!•  Anticipated Performance Under Load!•  Known Defects!
  • Follow Us: #ITSMSummit!NEW OPERATIONAL REQUIREMENTS!§  Keep the data moving!§  Query on streams!§  Handle stream imperfections!§  Integrate stored and streaming data!§  Guarantee data safety and availability!§  Partition and scale applications automatically!§  Process and respond instantaneously!§  Drive Interoperability!
  • Follow Us: #ITSMSummit!CLEANING UP THE LANDSCAPE!Adapted from: Akella, Janaki. “IT Architecture: Cutting costs and complexity.” McKinsey Quarterly 13 Nov 2009https://www.mckinseyquarterly.com/IT_architecture_Cutting_costs_and_complexity_2391!Silo!MonolithicFramework!Niche!Launch Pad!Information Bus!
  • Follow Us: #ITSMSummit!CREATING A DIRECTED WORKFLOW!Directed !Non Directed!Observe! Orient! Decide!Launchpad!Executive Dashboard!Business Area!Dashboards!Application PAC!Dashboards!Command Center!Dashboards!Technology Owner!Dashboard!Application Owner!Dashboard!ProblemIsolation!Workspace!ProblemDiagnostics!Workspace!System Detail!View!ComponentDetail!View!
  • Follow Us: #ITSMSummit!A TYPICAL ITIL CHANGE PROCESS!Objectives:!- What Changes are coming?
- Why is the change required?
- Has the existing configuration been reviewed?
- What is the risk & impact, low, medium, high?
- what is the plan B?!
  • Follow Us: #ITSMSummit!Palette of libraryassets enable easyworkflow compositionthrough drag and dropAccess to rich libraries(toolkits) of reusableautomation assets thatenable to speedautomation creationRich set of actions types,flow control, data handlingprimitives that simplifycreation of complexautomationsEasy workflow action editingfor managing: data mapping,error recovery options,implementation details , etc.Graphical editor forcomposing andconnectingworkflowsRich toolingfunctions to edit,version, debug,optimize workflowsAUTOMATING ITIL PROCESSES!
  • Follow Us: #ITSMSummit!FINDING METRICS THAT MATTER!§  Will the metric be used in a report? If so, which one? How is it used in thereport?!§  Will the metric be used in a dashboard? If so, which one? How will it beused?!§  What action(s) will be taken if an alert is generated? Who are the actors?Will a ticket be generated? If so, what severity?!§  How often is this event likely to occur? What is the impact if the eventoccurs? What is the likelihood it can be detected by monitoring?!§  Will the metric help identify the source of a problem? Is it a coincident /symptomatic indicator?!§  Is the metric always associated with a single problem? Could this metricbecome a false indicator?!§  What is the impact if this goes undetected?!§  What is the lifespan for this metric? What is the potential for changes thatmay reduce the efficacy of the metric?!Evaluating the Effectiveness of a Metric!
  • Follow Us: #ITSMSummit!PICKING BETTER MONITORS!Itemize theexistingmonitors!Brainstormpotential gaps tofill!Deploy newmonitors!Identify thepotentialrisks!Itemize theexistingmonitors!Determineif whichgaps exist!Fill themonitoringgaps!Current Approach!Proposed Approach!
  • Follow Us: #ITSMSummit!WHAT GOOD MONITORING LOOKS LIKE!Corporate!LANs & VPNs!Load Balancer!Load Balancer!Firewall!Switch!Web Server Farm!Database!Data Power!Mainframe!Middleware!Load Balancer!1.  System Availability!2.  Operating System Performance!3.  Hardware Monitoring!4.  Service/Daemon and Process Availability!5.  Error Logs!6.  Application Resource KPIs!7.  End-to-End Transactions!8.  Point of Failure Transactions!9.  Fail-Over Success!10. “Activity Monitors” and “Reverse Hockey Stick”!Elements of Good Monitoring!!!!!!!!!!!!!!!!!!!!!!!!!!!!!3!2! 4! 5! 6!1!!!!!7!!!!!!!!!!!!!!!!!!!8!!!!!!!!!!!!!!!!!!!!!9! !!!!!!10!
  • http://info.streamdatacenters.com/Portals/165393/Gallery/Album/6624/Richardson%20Aerial-01.png!This is no longer the way weshould think about monitoring!Monitoring Happens Here!
  • Cloud Monitoring Happens Here!
  • Follow Us: #ITSMSummit!WHAT DO YOU WANT TO ACCOMPLISH?!Your monitoring should help you answer:!•  How will we know if the users are getting the experience theyare expecting?!•  How much capacity do we need during normal and peak timesto ensure user expectations are met?!•  How quickly can the provider we select ramp up to meet ourneeds if we find that the service is underperforming?!•  How fast do we need to be able to access additional capacityonce it is ready for us?!
  • Follow Us: #ITSMSummit!69Here comes the elevator pitch…
  • Follow Us: #ITSMSummit!70!THE IBM SOLUTION!!IBM SmartCloud Suite offers essential management capabilities forapplications in complex cloud and hybrid environments. !!! !•  At-a-glance status determinationvia network topology graphs!•  Proactively identify and respond tocompliance issues!•  Monitor the performance of theenvironment and the tenants livinginside of it!•  Understand the current capacityneeds and forecast future needs!•  Understand the costs associatedwith providing the service andenable “showback” and chargeback” reporting to the applicationowners!SINGLE POINT OFMANAGEMENT!!•  Minimize service and systemoutages!•  Identify recurring incidents andimplement action to remediateproblems before they causeimpacts!•  Assist troubleshooting bysuppressing “noise” events andproviding root cause determination!MAXIMIZE SERVICEAVAILABILITY!!•  Reduce the need for manualaction or intervention!•  Automate for repeatability andelimination of human error!•  Develop standardized practicesfor complex business processes!•  Enable the development of APIsto allow for self-servicemanagement by the consumers!IMPROVED OPERATIONALEFFICIENCY!
  • Follow Us: #ITSMSummit!Understand the !end-user experience !Follow changing !workloads!Mobile devices & 
smart endpoints!Private, public & 
hybrid clouds!Highly virtualized applications,storage & networks !Discovery!Visibility intoapplicationresources!End UserExperience!Transactionperformancemonitoring toensure SLAcompliance!!!TransactionTracking!Rapid problemisolation throughtransaction 
path analysis!!!Diagnostics!!Domain-specificoperations toolsfor diagnosis andrepair!!!PredictiveAnalytics!Proactiveapproach toreduce outages& improveperformance!!!shared data & common services!See steps !across the cloud !VISIBILITY, CONTROL AND AUTOMATION TO INTELLIGENTLY MANAGECRITICAL APPLICATIONS IN CLOUD AND HYBRID ENVIRONMENTS.!APPLICATION PERFORMANCE MANAGEMENT!
  • Follow Us: #ITSMSummit!COMPOSITE APPLICATIONS!Site Content!Search!Session!Information!User Login!& Identity Mgmt!Content Mgmt!System!Social Network!Widgets!Site Tracking!& Analytics!Banner Ads & !Revenue Generators!Multimedia &!CDN Content!
  • Follow Us: #ITSMSummit!GAINING PERSPECTIVE REQUIRES BALANCE!Packet Capture!Synthetic Transactions!Client Monitoring!Client Monitoring!Synthetic Transactions!Server Probe!1.  Client to the Server!2.  Server to the Client!3.  “3rd Party” Vantage Point!4.  Synthetic Transactions!Four Perspectives of User Experience!
  • Follow Us: #ITSMSummit!Predic.ve  Outage  Avoidance  Ensure  availability  of  applicaBons  and  services      • Use learning tools toaugment custom bestpractices• Leverage statisticalmethods to maximizepredictive warning• Improve problemdetection across IT silosPredictFaster  Problem  Resolu.on  Find  &  correct  problems  faster  with  tools  that  determine  acBons  required  to  resolve  issues      • Identify problems quickerwith insight to largeunstructured repositories• Isolate problems quicker bybringing relevant unstructureddata into probleminvestigations• Repair problems quicker withthe right details quickly to hand.ResolveOp.mized  Performance    Track,  OpBmize,  and  Predict  capacity  and  performance  needs  over  Bme      • Track capacity andperformance of applicationsand services in classic andcloud environments• Optimize resourcedeployment with what-if andbest fit planning tools• Escalate capacity andperformance problems beforethey cause critical failuresPerformImproved  Insight    Enhance  visibility  into  systems  resource  relaBonships  while  increasing  customer  saBsfacBon        • Determine what resourcesare interdependent toassess impact of failures• Gain insight into what isimportant to your customer• Decrease customer churnand acquisition costs whileincreasing customerretention and satisfactionKnowAutomated Analytics helps lower IT Administration Costs:• Performance and Capacity planning tools monitor appropriately and escalate, reducing timeconsuming report browsing• Learning tools reduce customization and best practices investment on initial deployment• Log Analysis helps speed problem resolution to be able to do more with lessBUSINESS VALUE OF ADOPTING ANALYTICS!
  • Follow Us: #ITSMSummit!That is great but we need more…
  • Follow Us: #ITSMSummit!In addition to handling monitoring and performance alerts, ithelps drive improved availability.!Our Formula:!1.  Continually collect, categorize, and analyze all events from as manysources as possible!2.  Correlate events and analyze them using previous outages aspatterns to identify situations worth investigating!3.  Notify a support team so the situation can be mitigated beforebecoming an outage!4.  Automate responses that have well established situationalfingerprints and proven resolution steps!THE EVENT MANAGEMENT FOCUS!
  • Follow Us: #ITSMSummit!ONE INTEGRATED ENVIRONMENT!Distributed! Database!Mainframe! Network! Middleware! Storage!Event Pool!Operational!Data Warehouse!Predictive!Enrichment & Correlation!Service Desk!Paging!CMDB!Knowledge!Asset Mgmt!Event Catalog!Event API!Business Telemetry!3rd Party Providers!Presentation Framework!
  • Follow Us: #ITSMSummit!Presentation!Framework!Asset Management& TopologyDatabase!Aggregation andAnalysis!SecurityManagement!AvailabilityManagement!ConfigurationManagement!ChangeManagement!PerformanceManagement!Enterprise DataSources!BusinessTelemetryInformation!Configuration Discrepancies!Enrichment Data!Business Activity Data!Historical Data!“Enriched” Events!Change Activity!Topology Snapshots!Trend-RelatedFaults!DiscoveredProblems!Status Indications!Incidents!Audit Information and Suspicious Activity!Enrichment Data! Business Activity Data!AutomatedDiscovery!
  • Follow Us: #ITSMSummit!CONCEPTUALIZING SITUATIONAL AWARENESS!SituationalAwarenessEngine!Adapted from http://www.slideshare.net/TimBassCEP/getting-started-in-cep-how-to-build-an-event-processing-application-presentation-717795!Real-TimeEvent Streams!Detected andPredicted Situations!Patterns fromHistorical Data!Causal Relationshipfrom Past RCAs!
  • Follow Us: #ITSMSummit!CONCEPTUAL MODEL OF COMPLEX EVENT PROCESSING!Adapted from http://www.slideshare.net/aparnachaudhary/esper-cep-engine!Event Pipeline!Event Queries!Time Window!Data Events!Control Event!Other Events!Event Filter!Scenarios!A!B!C!Feedback Loop!Event Intelligence!Action Events!
  • Follow Us: #ITSMSummit!ITERATIVE DEVELOPMENT!As you recognize opportunities tocapture knowledge, use it to improveyour Event Management System. !
  • Follow Us: #ITSMSummit!The IT Culture is driven to technology for solutions. Leverageyour monitoring and testing tools to help practice failurescenarios. Work on tracking potential points of failure bycreating monitoring and report the rate of occurrence to thedevelopers at the start of each new iteration.!PLAYING TO OUR STRENGTHS!
  • Follow Us: #ITSMSummit!LET’S KEEP THE CONVERSATION GOING…!Andrew.P.White@Gmail.com!ReverendDrew!SystemsManagementZen.Wordpress.com!systemsmanagementzen.wordpress.com/feed/!@SystemsMgmtZen!ReverendDrew!APWhite@us.ibm.com!614-306-3434!