The document discusses root cause analysis in IT problem resolution. It defines root cause analysis and explains its importance in improving IT processes and meeting organizational goals. It describes the typical problem resolution process and challenges, including lack of a structured process and "blame game" between teams. The document reviews different problem detection methods and common root cause analysis approaches, and proposes ways to improve processes through mapping of services and infrastructure, implementing a structured collaborative process, and leveraging tools that provide historical context and topology-based correlation.
Non functional requirements. do we really care…?OSSCube
Non Functional requirements are an essential part of a project’s success, sometimes it becomes less focused area as everyone tries to make project successful in terms of functionality. This recorded webinar uncovers what can happen if Non Functional requirements are not addressed properly. What are the after impacts? You also learn the importance of Non Functional requirement, their identification, implementation and verification.
Network Operation Centre Highlights and Practices
In complex networks, the telecom operators and IT organizations can consider the report for high level planning and operations
The systematic use of proven principles, techniques ,languages and tools for the cost-effective analysis ,documentation and on-going evolution of user needs and the external behavior of a system to satisfy those user needs.
Requirement Elicitation
Facilitated Application Specification Technique(FAST)
Quality Function Deployment
USE-CASES
Non functional requirements. do we really care…?OSSCube
Non Functional requirements are an essential part of a project’s success, sometimes it becomes less focused area as everyone tries to make project successful in terms of functionality. This recorded webinar uncovers what can happen if Non Functional requirements are not addressed properly. What are the after impacts? You also learn the importance of Non Functional requirement, their identification, implementation and verification.
Network Operation Centre Highlights and Practices
In complex networks, the telecom operators and IT organizations can consider the report for high level planning and operations
The systematic use of proven principles, techniques ,languages and tools for the cost-effective analysis ,documentation and on-going evolution of user needs and the external behavior of a system to satisfy those user needs.
Requirement Elicitation
Facilitated Application Specification Technique(FAST)
Quality Function Deployment
USE-CASES
Mistakes we make_and_howto_avoid_them_v0.12Trevor Warren
This presentation was put together for the CMGA (www.cmga.org.au) meetup in Canberra (ACT), Australia. It's an attempt to share some of my experiences building and delivering systems over the last decade and a half.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
1. The Elusive Root Cause Of IT Problems
And How To Easily Identify It
Noam Biran
Director of Product Management
2. Introduction
Mr. Biran
• Director of Product Management at Neebula
• 20 years experience in systems management & BSM
• Innovation Product Management at BMC
• Co-founder of Appilog (now HP uCMDB & DDMA)
About Neebula
Neebula provides the first and only automatic service-centric IT management
solution allowing IT organizations to improve the service provided to the business
by shifting from managing disparate technology silos to managing the services
running in the data center. Leveraging unique technology that automatically maps
business services to the underlying infrastructure, Neebula enables the IT team to
increase availability of the main services they manage and reduce the time to
repair of problems.
3. Agenda
• Introduction
• Root cause analysis defined
• The problem resolution process
• Problem detection
• Root cause analysis methods
• Improving root cause analysis processes
4. Root Cause Analysis Definition
ITIL V3
An Activity that identifies the Root Cause of
an Incident or Problem.
Root Cause Analysis typically concentrates on
IT Infrastructure failures.
Wikipedia
Root Cause Analysis is any structured
approach to identify the factors that resulted
in the harmful consequences of one or more
past events
5. The importance of Root Cause Analysis
• Root Cause Analysis has a high impact on
– IT processes
• The efficiency of the overall incident/problem
management process
• Good RCA discipline requires well established
configuration management
– Organizational goals
• Meeting internal and external SLAs
• Financial (budget & revenue) implications
• Brand / customer loyalty
7. The Critical Role of Root Cause Analysis
• Improper (or lack of) identification of the real
root cause may yield:
– Repeating problems
– Increased downtime
– Waste of human
resources on
“fixing” the wrong
issues
– Risk to the business
8. The Life of The Operator
We expect the operator
– To handle 1000’s of cryptic events
– Understand impact on 100’s of services
– Understand the correlation to
customers service complaints
– Understand what changed
– Orchestrate the resolution
And make these decisions within minutes to
reduce MTTR
Are we giving our operators the tools to
succeed?
10. Problem Resolution Process
• Events coming in to the NOC
• NOC performs some investigation
• Root cause analysis is shared between NOC
& 2nd/3rd level support (admins)
• Low level diagnostics & problem resolution
is done by 2nd/3rd level support (admins)
11. Involved Parties & Tools
• Tools
– Monitoring tools
– Configuration management tools
• People
– Users
– NOC
– Admins – specialized teams focused on specific
area, e.g. system, database, network
– Application support / developers
12. The Common Process – Blame Game
• No structured process
• Lack of overall cross-domain view
• Each team has its own terminology and view
• Each team is working on its own
14. Potential Problem Symptoms
• Lack of certain functionality
– A certain transaction does not work
• Performance degradation
– Fund transfer response time is above 2 sec.
• Availability issue
– Application doesn’t work
• None
– Unnoticeable failure due to high availability
configuration
15. Problem Detection
• Good problem detection methods are key for a
structured root cause analysis process
• Problem detection tools should provide sufficient
data to the root cause analysis process
• There are various distinct methods each with its
pros and cons
• There is no single superior detection method
16. Detection – Users
• What it does
– Compensates for unknown / unreported
problems
• What it doesn’t
– Supposedly accurate – actually might point in
the wrong direction
– Usually takes place
too late for a quick fix
& impact to business
17. Detection – Infrastructure Monitoring
• What it does
– Monitor each technical element
comprising the service
– Great way to identify
specific availability failures
• What it doesn’t
– Hard to correlate with real user experience
– Too many false positives
– Lots of events on symptoms rather on actual problem
18. Detection – End User Experience
• What it does
– Measure overall response time of user transactions
– Synthetic or real user transactions
– The ultimate problem detection method
• What it doesn’t
– No real breakdown to assist
in pinpointing the problem
or even the domain
19. Detection – Transaction Breakdown
• What it does
– Discovery of each transaction’s path
within the data center
– Highlight potential performance
problems within the transaction
execution
• What it doesn’t
– No correlation to infrastructure
monitoring
– Cannot cover the entire data center
– domain specific
20. Detection – Domain Specific Tools
• What it does
– Drill down in a specific application
– Great analysis & diagnostics within an application
• What it doesn’t
– No data center wide view
– Lack of insight into the
connections between
applications
23. Potential Root Cause Types
• Configuration change
• Version upgrade
• Hardware fault
• Software bug
• Capacity problem
• Resource collision
24. Common Ways for Root Cause Analysis
• War room scenario
• The log file approach
• APM tools
• Transaction management
• Manual event correlation / analysis
25. War Room Scenario
• Getting everyone in the same room
• Each has its own data and terminology
• Blame game
• Takes a lot of time
26. The Log File Approach
• An admin sits and analyzes log files and
other historical data from various sources
• A domain specific approach
• Certain degree of structured process
• Might identify problems that
are not the root cause
(distractions)
27. APM Tools
• An admin sits and analyzes log files and
other historical data from various sources
• A domain specific approach
• Certain degree of structured process
• Might identify problems that
are not the root cause
(distractions)
28. Transaction Management
• A great tool to point to the probable area
where the root cause resides
• Limited to specific domains
• Inability to correlate with infrastructure
metrics / failures
29. Manual Event Correlation / Analysis
• Requires cross-domain expertise
• Requires understanding of dependencies
between components
• Time consuming
• Lack of insight into other
non-event data
31. Making The Best From Existing Tools
• Choose problem detection methods that
assist in the root cause analysis process
• Turn the root cause analysis into a
structured process
– Internal team processes
– Inter-team processes
• Common language & visibility between
teams
32. New Methods: Mapping
• Mapping of Business service & applications
and the supporting infrastructure
• Ties symptoms (user) to problems
(technology)
• Introduces a common language between
teams
• Enables a high level cross-domain view
33. New Methods: Structured Process
• Define a structured process for problem
investigation and root cause analysis
• Define how collaboration should occur
during root cause analysis between teams
34. New Methods: Tools
• Use tools that provide a historical
dimension for problem investigation
• Use tools that enable the correlation of
problems to configuration changes
• Use topology based correlation instead of
rule based (or manual based) correlation
Editor's Notes
Introduction to the subjectWebinar logistics: presentation first, send questions during, answer questions at the end
RCA is problematic even to defineITIL definition -> useless. ITIL failedWikipedia:StructuredFactorsConsequencesPast events – I’ll call them symptoms
Talk about each bullet
Many data sources (event feeds)All are mixed and funneled into the NOCNOC needs to filter and make order in them based on:RelevanceSource / derivedBut the NOC doesn’t have the tools or processes to do thisNo structured way to do this filtering (though the NOC is used to structured processes like run book)
Taking care of the symptoms and not the problemsAssociating wrong events -> figuring out the incorrect root cause
NOC is used to structured processes (like run book)We don’t give them toolsWe don’t give them structured processes (or any processes)They don’t posses cross-domain knowledge usually
Isolation – diagnosticsNOC’s investigation may yield forwarding to the wrong team and therefore wrong analysis done in the wrong context
Explain eachHow do they all tie together? Usually they don’t
Problem detection begins with the symptomsSame symptoms may be caused by different problems
We need a combination of toolsChoose the right mix to assist in the RCA processNeed synergy between the methods
Cross domainCross disciplineRequire deep understanding