A presentation on how we support & pro-actively resolve our cloud based application issues. We will share about the tools used & how we track them.
Speakers: Derek, Nurul Zaman
Managing a team and project are quite synonymous. Especially, teams require effective distribution of responsibility / roles. Once that is setup, a proper process guides people to make progress. All this fits into a product lifecycle, which is essential to develop the right product, in the right way, and deliver it at the right time.
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
In any software organization, stability & innovation are always at loggerheads - the faster you move, the more things will break. This talk defines what SRE org looks like at high-tech organizations (Google, Uber).
An overview of the process for handling enterprise client ticketsMohammad Ali
What steps should Client Support Analysts or Support Engineers should take when handling critical priority tickets? Here is a proposed framework on what to do?
Best practices to dedupe and prioritize IT alerts using our rich integrations with popular monitoring tools including Datadog, SolarWinds, AWS CloudWatch, and Nagios.
Bidirectional connectivity with ticketing systems including Jira and ServiceNow.
New ChatOps capabilities, in which you can acknowledge and respond to alerts using collaboration tools such as Slack, HipChat, and Microsoft Teams.
New Mass Notifications Functionality and the ability to spawn instant video conferencing and teleconferencing sessions to aid in collaboration.
What to Expect When You're Expecting (to Own Production)Michael Diamant
The intended presentation audience is developers unfamiliar with owning a production environment. I aim to share lessons I’ve learned while supporting production environments and to paint a path for how ownership can be built.
By no means is this intended to be a comprehensive guide to production ownership. Instead, it should be treated as an introduction or one of the first few steps into the topic.
This presentation was motivated by a former colleague seeking to help frame his team's mindset toward production ownership. He joined a team that was not accustomed to production deploys, on-call, etc and thought it would be valuable to share insight from our experience together in an environment where developers co-owned production.
Managing a team and project are quite synonymous. Especially, teams require effective distribution of responsibility / roles. Once that is setup, a proper process guides people to make progress. All this fits into a product lifecycle, which is essential to develop the right product, in the right way, and deliver it at the right time.
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
In any software organization, stability & innovation are always at loggerheads - the faster you move, the more things will break. This talk defines what SRE org looks like at high-tech organizations (Google, Uber).
An overview of the process for handling enterprise client ticketsMohammad Ali
What steps should Client Support Analysts or Support Engineers should take when handling critical priority tickets? Here is a proposed framework on what to do?
Best practices to dedupe and prioritize IT alerts using our rich integrations with popular monitoring tools including Datadog, SolarWinds, AWS CloudWatch, and Nagios.
Bidirectional connectivity with ticketing systems including Jira and ServiceNow.
New ChatOps capabilities, in which you can acknowledge and respond to alerts using collaboration tools such as Slack, HipChat, and Microsoft Teams.
New Mass Notifications Functionality and the ability to spawn instant video conferencing and teleconferencing sessions to aid in collaboration.
What to Expect When You're Expecting (to Own Production)Michael Diamant
The intended presentation audience is developers unfamiliar with owning a production environment. I aim to share lessons I’ve learned while supporting production environments and to paint a path for how ownership can be built.
By no means is this intended to be a comprehensive guide to production ownership. Instead, it should be treated as an introduction or one of the first few steps into the topic.
This presentation was motivated by a former colleague seeking to help frame his team's mindset toward production ownership. He joined a team that was not accustomed to production deploys, on-call, etc and thought it would be valuable to share insight from our experience together in an environment where developers co-owned production.
OpsGenie recently launched the new Incident Command Center to enable efficient command, control, and coordination of incident responses. View the slides to learn how this platform can help you:
- Bring responders together in a virtual “war room” using -
- OpsGenie-hosted audio and video conferencing
- Track team activity and status from a centralized dashboard
- Enable collaboration across desktop and mobile devices
- Capture detailed metrics for post-incident analysis
Independently from the DevOps movement but starting from the same problems, Google developed its own strategy defining a new specific role called SRE (Site Reliability Engineer). This introduction tries to explain the history and the concept of this methodology and to compare it with the DevOps manifesto to understand what does it mean to adopt DevOps and what does it mean to be an SRE and what the two things are sharing and where they diverge.
DevOps is mainly about culture and philosophy, but also very much about a huge set of tools. Which of these tools do you need? What benefits does each one bring? How do they complement each other? It’s definitely not easy to answer these questions, especially considering the day-by-day growing DevOps toolset. OpsGenie introduced its DevOps Playground to help you with that!
Doing Analytics Right - Designing and Automating AnalyticsTasktop
There is no “one-sized fits all” of development analytics. It is not as simple as “here are the measures you need, go implement them.” The world of software delivery is too complex, and software organizations differ too significantly, to make it that simple. As discussed in the first webinar, the analytics you need depend on your unique business goals and environment.
That said, the design of your analytics solution will still require:
* The dashboards,
* the required data, and
* an appropriate choice of analytical techniques and statistics to apply to the data.
This webinar will describe a straightforward method for finding your analytic solution. In particular, we will explain how to adapt the Goal, Question, Metric (GQM) method to development processes. In addition, we will explain how to avoid “the light is brighter here” analytics anti-pattern: the idea that organizations tend to design metrics programs around the data they can easily get, rather than figuring out how to get the data they really need.
Doing Analytics Right - Building the Analytics EnvironmentTasktop
Implementing analytics for development processes is challenging. As in discussed in the previous webinars, the right analytics are determined by the goals of the organization, not by the available data. So implementing your analytics solutions will require an efficient analytics and data architecture, including the ability to combine and stage data from heterogeneous sources. An architecture that excludes the ability to gain access to the necessary data will create a barrier to deploying your newly designed analytics program, and will force you back into the “light is brighter here” anti-pattern.
This webinar will describe the technical considerations of implementing the data architecture for your analytics program, and explain how Tasktop can help.
Leading IT and DevOps teams are moving beyond simple alert escalation and oncall management to Incident Response Orchestration (IRO). IRO enables you to acceleration the identification of problems, the notification of the right people, and the facilitation of collaboration across all business units to resolve issues quickly. In this webinar, Berkay Mollamustafaoglu, CEO of OpsGenie, will discuss the challenges faced by modern operations teams and how IRO is empowering organizations to address incidents of any size - before they impact business.
Talk given by Robert Maxwell, Lead Incident Handler and Kelly McCracken, Director, CSIRT at Salesforce, at Techno Security, in June 2016
Effective IR Communication & Coordination using a Case Management System Description: Too often IR teams are left to managing incidents from email, personal folders, and shared drive. Salesforce's CSIRT will demonstrate how they have developed an effective case management system to increase the team's ability to effectively track, respond, manage, measure, and report on incidents from detection through the lessons learned phase of the incident response lifecycle.
Goals to strive for in automating infrastructure deployment and monitoring. This focuses on all aspects of automation, from the development cycle, to deployment, to maintaining live services, all the way through data analysis.
OpsGenie recently launched the new Incident Command Center to enable efficient command, control, and coordination of incident responses. View the slides to learn how this platform can help you:
- Bring responders together in a virtual “war room” using -
- OpsGenie-hosted audio and video conferencing
- Track team activity and status from a centralized dashboard
- Enable collaboration across desktop and mobile devices
- Capture detailed metrics for post-incident analysis
Independently from the DevOps movement but starting from the same problems, Google developed its own strategy defining a new specific role called SRE (Site Reliability Engineer). This introduction tries to explain the history and the concept of this methodology and to compare it with the DevOps manifesto to understand what does it mean to adopt DevOps and what does it mean to be an SRE and what the two things are sharing and where they diverge.
DevOps is mainly about culture and philosophy, but also very much about a huge set of tools. Which of these tools do you need? What benefits does each one bring? How do they complement each other? It’s definitely not easy to answer these questions, especially considering the day-by-day growing DevOps toolset. OpsGenie introduced its DevOps Playground to help you with that!
Doing Analytics Right - Designing and Automating AnalyticsTasktop
There is no “one-sized fits all” of development analytics. It is not as simple as “here are the measures you need, go implement them.” The world of software delivery is too complex, and software organizations differ too significantly, to make it that simple. As discussed in the first webinar, the analytics you need depend on your unique business goals and environment.
That said, the design of your analytics solution will still require:
* The dashboards,
* the required data, and
* an appropriate choice of analytical techniques and statistics to apply to the data.
This webinar will describe a straightforward method for finding your analytic solution. In particular, we will explain how to adapt the Goal, Question, Metric (GQM) method to development processes. In addition, we will explain how to avoid “the light is brighter here” analytics anti-pattern: the idea that organizations tend to design metrics programs around the data they can easily get, rather than figuring out how to get the data they really need.
Doing Analytics Right - Building the Analytics EnvironmentTasktop
Implementing analytics for development processes is challenging. As in discussed in the previous webinars, the right analytics are determined by the goals of the organization, not by the available data. So implementing your analytics solutions will require an efficient analytics and data architecture, including the ability to combine and stage data from heterogeneous sources. An architecture that excludes the ability to gain access to the necessary data will create a barrier to deploying your newly designed analytics program, and will force you back into the “light is brighter here” anti-pattern.
This webinar will describe the technical considerations of implementing the data architecture for your analytics program, and explain how Tasktop can help.
Leading IT and DevOps teams are moving beyond simple alert escalation and oncall management to Incident Response Orchestration (IRO). IRO enables you to acceleration the identification of problems, the notification of the right people, and the facilitation of collaboration across all business units to resolve issues quickly. In this webinar, Berkay Mollamustafaoglu, CEO of OpsGenie, will discuss the challenges faced by modern operations teams and how IRO is empowering organizations to address incidents of any size - before they impact business.
Talk given by Robert Maxwell, Lead Incident Handler and Kelly McCracken, Director, CSIRT at Salesforce, at Techno Security, in June 2016
Effective IR Communication & Coordination using a Case Management System Description: Too often IR teams are left to managing incidents from email, personal folders, and shared drive. Salesforce's CSIRT will demonstrate how they have developed an effective case management system to increase the team's ability to effectively track, respond, manage, measure, and report on incidents from detection through the lessons learned phase of the incident response lifecycle.
Goals to strive for in automating infrastructure deployment and monitoring. This focuses on all aspects of automation, from the development cycle, to deployment, to maintaining live services, all the way through data analysis.
Not having the ability to identify and rapidly respond to an abnormality means risking potential line shutdown, re-work, or maybe even a recall. Learn the steps needed to formalize and implement a proactive abnormality management program - including methods to error-proof your operations.
The on-call survival guide - how to be confident on-call Raygun
Is being a developer on-call making you burned out? Are there a lack of systems in place to support you when there's a major outage? This slide deck will go though how to create an on-call system that works.
Netconomy — Agile Transformation im Bereich Customer Service / Non-Pressure-...Agile Austria Conference
Viele Agile Transformationen werden von Druck und Ratschlägen aus einer Vielzahl unterschiedlichster Quellen begleitet. Thema des Talks ist ein Ansatz, wie eine agile Richtung und eine bessere Arbeitsumgebung in Zusammenarbeit mit allen Beteiligten ohne Druck erreicht werden kann; nämlich der Non-Pressure-Approach. Teil des Talks ist die Reflektion mit Mark Tödtling in einem offenen, transparenten Austausch.
The four generations of test automationrenard_vardy
A quick presentation comparing the main five test automation frameworks:
Record and Playback
- Data Driven
- Keyword Driven
- Function Driven
- Behaviour Driven
Then presentation separates the frameworks into generation 1 to 3 and rates them against the goal of test automation.
1. Improve Software quality
2. Early detection of bugs (Defects)
3. Reduce (not introduce) project risk
4. Easy to write and maintain by BA, Testing and technical resources
5. Reduced cost and time of development
Presentation from Smart ERP Solutions covering effective ways to work with Oracle's PeopleSoft support. Includes guidance on SR escalations and techniques to help expedite resolutions.
Similar to #speakgeek - Support Processes for iconnect360 (20)
#speakgeek - Testing Recipe: The iconnect360 WayDerek Chan
We practice both manual & automated testing to ensure product quality here in iconnect360. Come & explore the iconnect360 testing world!
Speakers: Gan, Larica, Miki, Rizwan
#speakgeek - Pragmatic Batch Process Management & Developer TestingDerek Chan
Sharing our practices in managing batch processes at scale and the tools used for rapid building and execution of unit tests and end-to-end testing
Speakers: Jecelyn, David Hong
#speakgeek - Agile development in iconnect360Derek Chan
Discussing through how Agile is implemented in iconnect360's development and the various challenges along the way as our process and practices mature.
Speaker: William Lim
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
10. How do we do that?
• Supporting a growing list of products
• From 1 country to 8 countries within 12 months
Processes
Exponential Growth in Support Demands
11. Tools for Support
Zendesk
- Ticketing system for L1, L2, L3, Help Center
Bugzilla
- Defect Management Tool used by Support, Dev, UX & QA
Kibana
- Logging tools for Developers & Support staff
WordPress
- Manage Knowledgebase for sharing
- Release Notes
14. Incident Management
• Resolving issue within the shortest possible
time.
• Root cause is not the main concern. Get the
service back online as soon as possible.
• Could be permanent or temporary.
For example, providing a workaround.
15. Incident Management
• Situation (Code Blue) - When a system wide
fault occurs, it affects all or most of the
customers.
• Resolution of the code blue will become the
utmost priority for the support & technical
teams.
• Proper communication & follow-up with
multiple clients is required.
16. Incident Management
• Two key measurement metrics
• Time To Respond
• Within the same working day
• Time for Resolution
• P1 Defect– 2 Working Days
• Resolution could be a workaround or permanent solution
• Resolution could be advice or mini training
• For P2 and P3 defects with patches, customers would be notified with
proper follow-up
18. Problem Management
• Proper tracking of issues and defects are important
• The entire technical team could be involved:
• Support (Level 2 and Level 3)
• Developers
• Business Analysts
• QA Team
• Infrastructure
• Release / DevOps
19. Problem Management
• Known Error Record (KER) - While the problem is being resolved,
a workaround or temp solution would be used to circumvent the
issues in production
• There are 2 types of Problem Management scenarios:
• Re-Active Problem Management
• Pro-Active Problem Management
20. Problem Management (Re-active)
• These are issues that arise from incidents and are directly
reported by the customers.
• Re-Active problems are given a higher priority to be resolved.
21. Problem Management (Pro-active)
• Issues or problems do not always come
from the end-users or customers.
• Problems could be identified internally
• Continuous improvement initiatives
• Communication to the end users
22. Problem Management (Defect Prioritisation)
Priority Definition Resolution
P1 1. These are defects that affect multiple users.
2. A crucial feature (or the entire system) is not usable and
there is no suitable workaround.
1. Need to be analysed and
resolved as soon as possible.
2. Typically fixed & deployed to
production within 2 working
days.
P2 1. These are defects that affect only some users.
2. The affected features are not used often or there is an easy
workaround available.
1. Will be fixed as part of the monthly
release cycle.
P3 1. These are minor defects (e.g. UI, wordings). 1. There is no fixed time for
resolution for these.
2. Will normally be attended to
after P1 and P2 defects are fixed.
3. Increase/decrease priority
accordingly.
24. Problem Management (Root Cause Analysis)
• Finding the root cause and coming up with a viable solution.
• Require monitoring of the production system.
• Collecting and interpreting logs.
25. Problem Management (Root Cause Analysis)
• Successful RCA:
• The problem can be eliminated completely.
• Lowering the risks of re-occurrence.
• Sometimes, we might have to contact vendors.
For examples, Microsoft & Telerik to provide
solutions.
26. Problem Management (Root Cause Analysis)
Define
• What is the Problem
• Determine the Scope and Goal
Analyse
• Analyse the causes
• Why does it happen
Prevent
• Develop appropriate solution
• Implement solution
27. Problem Management (5 Whys)
WHY
?
WHY ?
Why ?
Why ?
Why ?
Problem
Revelation
RespectTrust
Learn From the Past
Look positively
towards the Future
Being
Defensive
Blame
Game
Disrespectful &
Pessimistic
30. Problem Manager Role
• Keeps track of the problems in the
system and facilitates the resolution.
• Organises meetings and work with the
teams
• Hosts Root Cause Analysis sessions.
• Reports to management and
stakeholders.
31. Problem Manager Role
• Aids in finding systematic issues, technical issues and process
issues in the product and its supporting structure.
• Works closely with the Incident Manager on analysing the
incident trends.
32. Software Design Strategy for Better Support
• Proper error logging mechanism.
• Send out critical alerts when errors are detected.
• Easy to understand error messages with detailed information
(e.g. Error Codes) for a speedier response and troubleshooting.
35. Out Support Structure
Development / Engineering / vendor – PM, CM
1. Final point for technical resolution 1. Perform RCA activities
Level 3 – IM, PM, CM
1. Able to conduct more in-depth technical investigation 1. Perform RCA activities
Level 2 - IM
1. Able to handle more technical tasks
1. Database investigation
2. Hardware support & remote support
Level 1 (Customer Service / Helpdesk) - IM
1. Initial point of contact
2. Provide solutions to simple and known issues
1. Perform straight forward tasks
2. Give advice and suggestions
39. Conclusion
• Keep users happy by providing great support.
• Come up with a good support process:
• Incident Management
• Problem Management
• Proper Communication
• Encourage Level 3 Support Team to develop in-house checking tools to
improve support.
Give an example of a code blue
Users unable to login to the system because of a memory issue
Support team monitored the IIS logs to see where the memory leak was taking place and able to identify exactly what was causing the issue
It was discovered the issue occurred because of dirty data in the system, which most likely occurred during data migration
This was found and data patches.
The code was also immediately fixed and hot patched (to handle this type of dirty data).
Communication was sent out during and after successful resolution of the situation.
part of the continuous improvement initiative this need to be tracked and resolved in a similar fashion to the re-active problems.
Why? - The battery is dead. (first why)
Why? - The alternator is not functioning. (second why)
Why? - The alternator belt has broken. (third why)
Why? - The alternator belt was well beyond its useful service life and not replaced. (fourth why)
Why? - The vehicle was not maintained according to the recommended service schedule. (fifth why, a root cause)