This article is a follow up to “Don’t Catch Exceptions“, which advocates that
exceptions should (in general) be passed up to a “unit of work”, that is, a fairly
coarse-grained activity which can reasonably be failed, retried or ignored. A unit of
work could be:
an entire program, for a command-line script,
a single web request in a web application,
the delivery of an e-mail message
the handling of a single input record in a batch loading application,
rendering a single frame in a media player or a video game, or
an event handler in a GUI program
The code around the unit of work may look something like
[01] try {
[02] DoUnitOfWork()
[03] } catch(Exception e) {
[04] ... examine exception and decide what to do ...
[05] }
For the most part, the code inside DoUnitOfWork() and the functions it calls tries to
throw exceptions upward rather than catch them.
To handle errors correctly, you need to answer a few questions, such as
Was this error caused by a corrupted application state?
Did this error cause the application state to be corrupted?
Was this error caused by invalid input?
What do we tell the user, the developers and the system administrator?
Could this operation succeed if it was retried?
Is there something else we could do?
Although it’s good to depend on existing exception
An analysis of software aging in cloud environment IJECEIAES
The document analyzes software aging in cloud environments. Software aging occurs when errors accumulate over time in long-running software systems, degrading performance and potentially leading to failure. In cloud computing, aging can happen across virtual machines and cloud services. The paper reviews methods for detecting aging, such as by monitoring indicators like memory usage, response time, and traffic metrics. Machine learning algorithms and statistical models have been used to predict aging but combining multiple approaches could improve accuracy. While preventing aging entirely is impossible, detection techniques can help address it by restarting or rejuvenating systems before failures occur.
This presentation, given at the Cornell Commonspot SIG meeting on March 22, 2006, addresses issues we discovered in moving sites from a test to production server. Content is stored in a per-site Oracle schema, while user identifiers are defined in a per-server schema. When migrating, it's necessary to update user identifiers in the site database. Peter Hoyt and Paul Houle developed a Perl script that examines the database structure, identifies fields that contain user identifiers, and maps them from the old to new server.
Keeping track of state in asynchronous callbacksPaul Houle
There’s a lot of confusion about how asynchronous communication works in RIA’s
such as Silverlight, GWT and Javascript. When I start talking about the problems of
concurrency control, many people tell me that there aren’t any concurrency problems
since everything runs in a single thread. [1]
It’s important to understand the basics of what is going on when you’re writing
asynchronous code, so I’ve put together a simple example to show how execution
works in RIA’s and how race conditions are possible. This example applies to
Javascript, Silverlight, GWT and Flex, as well as a number of other environments
based on Javascript. This example doesn’t represent best practices, but rather what
can happen when you’re not using a proactive strategy that eliminates concurrency problems
Extension methods, nulls, namespaces and precedence in c#Paul Houle
Extension methods allow calling methods on null objects without exceptions and add methods to existing classes without modifying them. However, extension methods have different precedence than regular methods and can cause conflicts if defined in multiple namespaces. To avoid issues, extension methods should be defined carefully, null values should be checked, and namespaces should be managed properly.
This is a story of two types: GenericType and SpecificType, where GenericType is a
superclass of SpecificType. There are two types of explicit cast in C#:
The Prefix cast:
[01] GenericType g=...;
[02] SpecificType t=(SpecificType) g;
The as cast:
[03] GenericType g=...;
[04] SpecificType t=g as SpecificType;
Most programmers have a habit of using one or the other — this isn’t usually a
conscious decision, but more of a function of which form a programmer saw first. I,
for instance, programmed in Java before I learned C#, so I was already in the prefix
cast habit. People with a Visual Basic background often do the opposite. There are
real differences between the two casting operators
The document discusses error isolation and management in agile multi-tenant cloud applications. It proposes an 8-phase framework called Mapricot to isolate and manage errors. The 8 phases are: Measurable space (store errors), Analyze errors (categorize and count errors), Prioritize errors, Release correlation, Improved logging, Code improvement, Offer urgent help, and Training. The framework was evaluated on two cloud applications and showed improvements in isolating and managing errors over a control period.
Error Isolation and Management in Agile Multi-Tenant Cloud Based Applications neirew J
The document discusses error isolation and management in agile multi-tenant cloud applications. It proposes an 8-phase framework called Mapricot to isolate and manage errors. The 8 phases are: Measurable space (store errors), Analyze errors (categorize and count errors), Prioritize errors, Release correlation, Improved logging, Code improvement, Offer urgent help, and Training. The framework was evaluated on two cloud applications and showed improvements in isolating and managing errors over a control period.
Checking Windows for signs of compromiseCal Bryant
This document provides guidance on investigating compromised Microsoft Windows systems to identify how the system was compromised and what malware or unauthorized programs may be present. It outlines various locations in the file system, registry, services, and network settings where intruders commonly hide malware. Tools recommended for examining the system include using cmd.exe to view file timestamps, searching hidden folders and alternate data streams, and using Google to research any suspicious programs found. The document advises that while antivirus software can detect some threats, a fresh reinstall of the operating system is typically the most reliable way to restore a compromised system.
An analysis of software aging in cloud environment IJECEIAES
The document analyzes software aging in cloud environments. Software aging occurs when errors accumulate over time in long-running software systems, degrading performance and potentially leading to failure. In cloud computing, aging can happen across virtual machines and cloud services. The paper reviews methods for detecting aging, such as by monitoring indicators like memory usage, response time, and traffic metrics. Machine learning algorithms and statistical models have been used to predict aging but combining multiple approaches could improve accuracy. While preventing aging entirely is impossible, detection techniques can help address it by restarting or rejuvenating systems before failures occur.
This presentation, given at the Cornell Commonspot SIG meeting on March 22, 2006, addresses issues we discovered in moving sites from a test to production server. Content is stored in a per-site Oracle schema, while user identifiers are defined in a per-server schema. When migrating, it's necessary to update user identifiers in the site database. Peter Hoyt and Paul Houle developed a Perl script that examines the database structure, identifies fields that contain user identifiers, and maps them from the old to new server.
Keeping track of state in asynchronous callbacksPaul Houle
There’s a lot of confusion about how asynchronous communication works in RIA’s
such as Silverlight, GWT and Javascript. When I start talking about the problems of
concurrency control, many people tell me that there aren’t any concurrency problems
since everything runs in a single thread. [1]
It’s important to understand the basics of what is going on when you’re writing
asynchronous code, so I’ve put together a simple example to show how execution
works in RIA’s and how race conditions are possible. This example applies to
Javascript, Silverlight, GWT and Flex, as well as a number of other environments
based on Javascript. This example doesn’t represent best practices, but rather what
can happen when you’re not using a proactive strategy that eliminates concurrency problems
Extension methods, nulls, namespaces and precedence in c#Paul Houle
Extension methods allow calling methods on null objects without exceptions and add methods to existing classes without modifying them. However, extension methods have different precedence than regular methods and can cause conflicts if defined in multiple namespaces. To avoid issues, extension methods should be defined carefully, null values should be checked, and namespaces should be managed properly.
This is a story of two types: GenericType and SpecificType, where GenericType is a
superclass of SpecificType. There are two types of explicit cast in C#:
The Prefix cast:
[01] GenericType g=...;
[02] SpecificType t=(SpecificType) g;
The as cast:
[03] GenericType g=...;
[04] SpecificType t=g as SpecificType;
Most programmers have a habit of using one or the other — this isn’t usually a
conscious decision, but more of a function of which form a programmer saw first. I,
for instance, programmed in Java before I learned C#, so I was already in the prefix
cast habit. People with a Visual Basic background often do the opposite. There are
real differences between the two casting operators
The document discusses error isolation and management in agile multi-tenant cloud applications. It proposes an 8-phase framework called Mapricot to isolate and manage errors. The 8 phases are: Measurable space (store errors), Analyze errors (categorize and count errors), Prioritize errors, Release correlation, Improved logging, Code improvement, Offer urgent help, and Training. The framework was evaluated on two cloud applications and showed improvements in isolating and managing errors over a control period.
Error Isolation and Management in Agile Multi-Tenant Cloud Based Applications neirew J
The document discusses error isolation and management in agile multi-tenant cloud applications. It proposes an 8-phase framework called Mapricot to isolate and manage errors. The 8 phases are: Measurable space (store errors), Analyze errors (categorize and count errors), Prioritize errors, Release correlation, Improved logging, Code improvement, Offer urgent help, and Training. The framework was evaluated on two cloud applications and showed improvements in isolating and managing errors over a control period.
Checking Windows for signs of compromiseCal Bryant
This document provides guidance on investigating compromised Microsoft Windows systems to identify how the system was compromised and what malware or unauthorized programs may be present. It outlines various locations in the file system, registry, services, and network settings where intruders commonly hide malware. Tools recommended for examining the system include using cmd.exe to view file timestamps, searching hidden folders and alternate data streams, and using Google to research any suspicious programs found. The document advises that while antivirus software can detect some threats, a fresh reinstall of the operating system is typically the most reliable way to restore a compromised system.
This document provides an overview of distributed systems and some of the key challenges in building reliable distributed systems. It defines what a distributed system is and discusses some of the core challenges, including dealing with component failures. It describes different types of failures that can occur and emphasizes that distributed systems must be designed with an assumption of failure. It also summarizes some of the main protocols used in distributed systems like TCP/IP and discusses how client-server architectures help address some reliability issues.
Chaos Engineering Without Observability ... Is Just ChaosCharity Majors
The document discusses the importance of observability over traditional monitoring for complex distributed systems. It argues that observability allows engineers to understand what is happening inside their systems by asking questions from outside using instrumentation and structured data from the system. True observability requires events with high cardinality and dimensionality to account for unknown problems. It emphasizes testing systems in production and gaining an operational literacy to debug issues without prior knowledge.
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
Often what you monitor and get alerted on is defined by your tools, rather than what makes the most sense to you and your organisation. Alerts on metrics such as CPU usage which are noisy and rarely spot real problems, while outages go undetected. Monitoring systems can also be challenging to maintain, and overall provide a poor return on investment.
In the past few years several new monitoring systems have appeared with more powerful semantics and which are easier to run, which offer a way to vastly improve how your organisation operates and prepare you for a Cloud Native environment. Prometheus is one such system. This talk will look at the monitoring ideal and how whitebox monitoring with a time series database, multi-dimensional labels and a powerful querying/alerting language can free you from midnight pages.
This document provides an overview of a bug tracking system final year project. It discusses what a bug is, types of bugs, why bug tracking systems are necessary, components of an effective system, and examples of bugs that had extreme effects. It also outlines the proposed software's functionalities, development environment, hardware requirements, timeline, and ER diagram. The document aims to plan and design a bug tracking software application.
Developing fault tolerance integrity protocol for distributed real time systemsDr Amira Bibo
This document summarizes a research paper that developed a fault tolerance protocol called DRT-FTIP (Distributed Real Time – Fault Tolerance Integrity Protocol) for distributed real-time systems. The protocol is designed to function in dynamic networks and is coupled with an end-to-end distributed real-time scheduling algorithm (EOE-DRTSA) to increase integrity of scheduling. It has three phases - establishing communication, normal operation monitoring task execution, and error detection and recovery if tasks miss deadlines. The goal is to ensure tasks meet deadlines even in the presence of hardware or software faults.
Operating System Structure Of A Single Large Executable...Jennifer Lopez
The document discusses emerging developments in clinical decision support systems, noting that these systems are gaining recognition due to their ability to improve healthcare quality and safety by providing tailored patient information and recommendations to clinicians. It outlines some of the challenges in knowledge representation for clinical decision support systems, including the need to represent complex clinical knowledge and guidelines as well as uncertainties and probabilities. Emerging areas being explored include the use of artificial intelligence techniques like deep learning and natural language processing to advance clinical decision support.
This document discusses the need for adaptive and dynamic software development that can adjust to changing runtime environments and fault conditions. It argues that traditional static approaches to fault tolerance, like using fixed levels of redundancy, are inadequate as the threat environment may vary. The document then introduces an adaptive data integrity tool that allows the level of redundancy to change dynamically based on faults detected at runtime. This provides an example of the new approach called for, termed "New Software Development," that is more adaptive, maintainable and reconfigurable like New Product Development concepts.
Running Head MALWARE1MALWARE2MalwareName.docxcowinhelen
Running Head: MALWARE 1
MALWARE 2
Malware
Name
Institution
Course
Date
Malware Attacks
Potential Malicious Attracts Against the Network Organization
In the world of technology, everything can just happen. Information can pass from one region to another with ease meaning that everything has been simplified. However, the information technology has also been affected by a few challenges that seem to recur from time to time. They include;
Trojan horse virus- Typically, a computer virus has been a challenge for most organizations, but the most common especially in such a company is the Trojan horse virus. The virus is not self-replicating like the majority of others, but it has terrible consequences if it affects the network server of an organization (Durairajan, Saravanan, & Chakkaravarthy, 2016). Apparently, the virus is used by hackers to get access to data from a specified user illegally. With the installation of the video game, other competitor servers can access such kind of data and reproduce a copy even before the initially programmed game gets into the market.
Effects of Trojan horse virus
The data within a user’s computer can be deleted or be modified by the hacker. With new businesses cropping out day in day out, the problem may affect the video game company. A hacker may eliminate valuable data from the program and install a fake one which will, in turn, nullify the whole project. The virus can also be used to steal valuable information from a company that is supposed to be classified.
Computer worms- The worst thing about computer worms is that they are self-replicating. Apparently, they utilize the space in the computer network and dispatch it there where they replicate. The copies of the worms are multiplied and therefore displace the data that was there. Additionally, computer worms don’t need to be attached to the case of Trojan horse virus, but they develop from the network of equipment bit by bit (Anwar, Bakhtiari, Zainal, Abdullah, & Qureshi, 2015). The video game is a program that is used by a lot of people, and there is a high possibility that some computer worms begin to develop slowly.
Impact of computer worms
One of the major troubles with the computer worms is that they replicate themselves on the host server and hence, eliminate valuable files. They apparently take the place of a file which will automatically cause a breakdown in the network system of a company. For instance, the video game has been programmed and is made of various files. If a computer worm takes the place of one of the critical files, it would be nearly impossible for the program to function normally.
Blended threat-The case happens when both the Trojan horse and the computer worms all attack at the same time. The attack by both can have very grave consequences as they require no human efforts. Apparently, the threat uses the internet vulnerabilities and the user to initiate and spread an attack within the system. Importantly, the attack is a ...
This document discusses assumptions in software systems and their failure. It introduces three syndromes caused by assumption failures: 1) Horning syndrome, where assumptions are removed during software development, 2) Hidden Intelligence syndrome, where assumptions are hardcoded and hidden, and 3) Boulding syndrome, where a software system's dependencies and properties are unknown. It proposes strategies to address each syndrome and envisions a holistic approach to software development that explicitly addresses assumptions and their failures.
Cloud computing has gained popularity over the years, some organizations are using some form of cloud
computing to enhance their business operations while reducing infrastructure costs and gaining more
agility by deploying applications and making changes to applications easily. Cloud computing systems just
like any other computer system are prone to failure, these failures are due to the distributed and complex
nature of the cloud computing platforms.
Cloud computing systems need to be built for failure to ensure that they continue operating even if the
cloud system has an error. The errors should be masked from the cloud users to ensure that users continue
accessing the cloud services and this intern leads to cloud consumers gaining confidence in the availability
and reliability of cloud services.
In this paper, we propose the use of N-Modular redundancy to design and implement failure-free clouds
Cloud computing has gained popularity over the years, some organizations are using some form of cloud
computing to enhance their business operations while reducing infrastructure costs and gaining more
agility by deploying applications and making changes to applications easily. Cloud computing systems just
like any other computer system are prone to failure, these failures are due to the distributed and complex
nature of the cloud computing platforms.
Evolution of Monitoring and Prometheus (Dublin 2018)Brian Brazil
This talk looks at the evolution of monitoring over time, the ways in which you can approach monitoring, where Prometheus fit into all this, and how Prometheus itself has grown over time.
Why software performance reduces with time?.pdfMike Brown
Software performance reduces over time for several reasons: 1) additional features add complexity and slow programs down, 2) advanced graphical user interfaces require more system resources, and 3) frequent updates introduce bugs and security vulnerabilities that are resource-intensive to fix. Other factors include algorithms that don't scale well to large data sets, internet connectivity that allows malware to slow systems down, and changes to compilers that may inadvertently reduce previously optimized code performance.
This chapter discusses operating system structure and design. It examines issues like layered designs, microkernels, and modules. It presents the basic structures of popular OSes. Key points covered include system services, system calls, system programs, and user interfaces like command interpreters and graphical user interfaces. Popular OS examples like UNIX, Windows, and Mac OS are referenced throughout.
this ppt is about the information of factory data collection system.Various techniques are used to collect data from the factory floor. These technique range from clerical methods that requires workers to fill out paper forms that are later compiled, to fully automated methods that require no human participation.
Evolving role of Software,Legacy software,CASE tools,Process Models,CMMInimmik4u
The Evolving role of Software – Software – The changing Nature of Software – Legacy software, Introduction to CASE tools, A generic view of process– A layered Technology – A Process Framework – The Capability Maturity Model Integration (CMMI) – Process Assessment – Personal and Team Process Models. Product and Process. Process Models – The Waterfall Model – Incremental Process Models – Incremental Model – The RAD Model – Evolutionary Process Models – Prototyping – The Spiral Model – The Concurrent Development Model – Specialized Process Models – the Unified Process.
ON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSijgca
Grid computing or computational grid is always a vast research field in academic, as well as in industry also. Computational grid provides resource sharing through multi-institutional virtual organizations for dynamic problem solving. Various heterogeneous resources of different administrative domain are virtually distributed through different network in computational grids. Thus any type of failure can occur at any point of time and job running in grid environment might fail. Hence fault tolerance is an important and challenging issue in grid computing as the dependability of individual grid resources may not be guaranteed. In order to make computational grids more effective and reliable fault tolerant system is necessary. The objective of this paper is to review different existing fault tolerance techniques applicable in grid computing. This paper presents state of the art of various fault tolerance technique and comparative study of the existing algorithms.
Program aging is a degradation of performance or functionality caused by resource depletion. The aging affects the cloud
services which provide access to big data bank and computing resources. This suffers large budget and delays of defect removal, which
requires other related solutions including renewal in the form of controlled restarts. Collection of various runtime metrics are more
significant source for further study of detection and analysis of aging issues. This study highlights the method for detecting aging
immediately after their introduction by runtime comparisons of different development scenarios. The study focuses on aging of
program and service crash as a consequence.
A Study Of Real-Time Embedded Software Systems And Real-Time Operating SystemsRick Vogel
This document summarizes a seminar report on real-time embedded software systems and real-time operating systems. It discusses what embedded systems and real-time systems are, and describes some of the key components and requirements of real-time operating systems including multi-tasking, memory management, task scheduling, and case studies of several popular RTOSs. The report aims to provide an overview of the technologies behind embedded systems design and survey available real-time operating systems.
The document discusses a framework for a self-healing module (SHM) to automate response to failures in a virtual manufacturing execution system (vMES). The SHM would detect failures, determine resolutions, and enact resolutions without human intervention. This would improve productivity by automating error recovery. The SHM framework uses event listeners, triggers, and actions. Listeners detect events, triggers determine responses, and actions enact those responses, such as restarting processes, migrating virtual machines, or adjusting database settings. The goal is to automate operations and improve response time to failures in virtual manufacturing environments.
Chatbots are growing in popularity as developers face the
limitations of the mobile app. User interfaces that simulate a human
conversation, the history of chatbots goes back to the late 18th
century. I'll take you on a tour of that history with an eye on finding
insights on what is possible today and in the near future with chatbots.
Issues Covered: Amazon Alexa, Facebook Messenger Chatbots, Alan
Turing, and much more.
Estimating the Software Product Value during the Development ProcessPaul Houle
Nowadays software companies are facing a fierce competition
to deliver better products but offering a higher value to the customer.
In this context, software product value has becoming a major concern
in software industry, leading for improving the knowledge and better
understanding about how to estimate the software value in early development
phases. Other way, software companies encounter problems such
as releasing products that were developed with high expectations, but
they gradually fall into the category of a mediocre product when they
are released to the market. These high expectations are tightly related
to the expected and offered software value to the customer. This paper
presents an approach for estimating the software product value, focusing
on the development phases. We propose a value indicators approach to
quantify the real value of the development products. The aim is early
identifying potential deviations in the real software value, by comparing
the estimated versus the expected. We present an internal validation
to show the feasibility of this approach to produce benefits in industry
projects.
More Related Content
Similar to What do you do when you’ve caught an exception?
This document provides an overview of distributed systems and some of the key challenges in building reliable distributed systems. It defines what a distributed system is and discusses some of the core challenges, including dealing with component failures. It describes different types of failures that can occur and emphasizes that distributed systems must be designed with an assumption of failure. It also summarizes some of the main protocols used in distributed systems like TCP/IP and discusses how client-server architectures help address some reliability issues.
Chaos Engineering Without Observability ... Is Just ChaosCharity Majors
The document discusses the importance of observability over traditional monitoring for complex distributed systems. It argues that observability allows engineers to understand what is happening inside their systems by asking questions from outside using instrumentation and structured data from the system. True observability requires events with high cardinality and dimensionality to account for unknown problems. It emphasizes testing systems in production and gaining an operational literacy to debug issues without prior knowledge.
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
Often what you monitor and get alerted on is defined by your tools, rather than what makes the most sense to you and your organisation. Alerts on metrics such as CPU usage which are noisy and rarely spot real problems, while outages go undetected. Monitoring systems can also be challenging to maintain, and overall provide a poor return on investment.
In the past few years several new monitoring systems have appeared with more powerful semantics and which are easier to run, which offer a way to vastly improve how your organisation operates and prepare you for a Cloud Native environment. Prometheus is one such system. This talk will look at the monitoring ideal and how whitebox monitoring with a time series database, multi-dimensional labels and a powerful querying/alerting language can free you from midnight pages.
This document provides an overview of a bug tracking system final year project. It discusses what a bug is, types of bugs, why bug tracking systems are necessary, components of an effective system, and examples of bugs that had extreme effects. It also outlines the proposed software's functionalities, development environment, hardware requirements, timeline, and ER diagram. The document aims to plan and design a bug tracking software application.
Developing fault tolerance integrity protocol for distributed real time systemsDr Amira Bibo
This document summarizes a research paper that developed a fault tolerance protocol called DRT-FTIP (Distributed Real Time – Fault Tolerance Integrity Protocol) for distributed real-time systems. The protocol is designed to function in dynamic networks and is coupled with an end-to-end distributed real-time scheduling algorithm (EOE-DRTSA) to increase integrity of scheduling. It has three phases - establishing communication, normal operation monitoring task execution, and error detection and recovery if tasks miss deadlines. The goal is to ensure tasks meet deadlines even in the presence of hardware or software faults.
Operating System Structure Of A Single Large Executable...Jennifer Lopez
The document discusses emerging developments in clinical decision support systems, noting that these systems are gaining recognition due to their ability to improve healthcare quality and safety by providing tailored patient information and recommendations to clinicians. It outlines some of the challenges in knowledge representation for clinical decision support systems, including the need to represent complex clinical knowledge and guidelines as well as uncertainties and probabilities. Emerging areas being explored include the use of artificial intelligence techniques like deep learning and natural language processing to advance clinical decision support.
This document discusses the need for adaptive and dynamic software development that can adjust to changing runtime environments and fault conditions. It argues that traditional static approaches to fault tolerance, like using fixed levels of redundancy, are inadequate as the threat environment may vary. The document then introduces an adaptive data integrity tool that allows the level of redundancy to change dynamically based on faults detected at runtime. This provides an example of the new approach called for, termed "New Software Development," that is more adaptive, maintainable and reconfigurable like New Product Development concepts.
Running Head MALWARE1MALWARE2MalwareName.docxcowinhelen
Running Head: MALWARE 1
MALWARE 2
Malware
Name
Institution
Course
Date
Malware Attacks
Potential Malicious Attracts Against the Network Organization
In the world of technology, everything can just happen. Information can pass from one region to another with ease meaning that everything has been simplified. However, the information technology has also been affected by a few challenges that seem to recur from time to time. They include;
Trojan horse virus- Typically, a computer virus has been a challenge for most organizations, but the most common especially in such a company is the Trojan horse virus. The virus is not self-replicating like the majority of others, but it has terrible consequences if it affects the network server of an organization (Durairajan, Saravanan, & Chakkaravarthy, 2016). Apparently, the virus is used by hackers to get access to data from a specified user illegally. With the installation of the video game, other competitor servers can access such kind of data and reproduce a copy even before the initially programmed game gets into the market.
Effects of Trojan horse virus
The data within a user’s computer can be deleted or be modified by the hacker. With new businesses cropping out day in day out, the problem may affect the video game company. A hacker may eliminate valuable data from the program and install a fake one which will, in turn, nullify the whole project. The virus can also be used to steal valuable information from a company that is supposed to be classified.
Computer worms- The worst thing about computer worms is that they are self-replicating. Apparently, they utilize the space in the computer network and dispatch it there where they replicate. The copies of the worms are multiplied and therefore displace the data that was there. Additionally, computer worms don’t need to be attached to the case of Trojan horse virus, but they develop from the network of equipment bit by bit (Anwar, Bakhtiari, Zainal, Abdullah, & Qureshi, 2015). The video game is a program that is used by a lot of people, and there is a high possibility that some computer worms begin to develop slowly.
Impact of computer worms
One of the major troubles with the computer worms is that they replicate themselves on the host server and hence, eliminate valuable files. They apparently take the place of a file which will automatically cause a breakdown in the network system of a company. For instance, the video game has been programmed and is made of various files. If a computer worm takes the place of one of the critical files, it would be nearly impossible for the program to function normally.
Blended threat-The case happens when both the Trojan horse and the computer worms all attack at the same time. The attack by both can have very grave consequences as they require no human efforts. Apparently, the threat uses the internet vulnerabilities and the user to initiate and spread an attack within the system. Importantly, the attack is a ...
This document discusses assumptions in software systems and their failure. It introduces three syndromes caused by assumption failures: 1) Horning syndrome, where assumptions are removed during software development, 2) Hidden Intelligence syndrome, where assumptions are hardcoded and hidden, and 3) Boulding syndrome, where a software system's dependencies and properties are unknown. It proposes strategies to address each syndrome and envisions a holistic approach to software development that explicitly addresses assumptions and their failures.
Cloud computing has gained popularity over the years, some organizations are using some form of cloud
computing to enhance their business operations while reducing infrastructure costs and gaining more
agility by deploying applications and making changes to applications easily. Cloud computing systems just
like any other computer system are prone to failure, these failures are due to the distributed and complex
nature of the cloud computing platforms.
Cloud computing systems need to be built for failure to ensure that they continue operating even if the
cloud system has an error. The errors should be masked from the cloud users to ensure that users continue
accessing the cloud services and this intern leads to cloud consumers gaining confidence in the availability
and reliability of cloud services.
In this paper, we propose the use of N-Modular redundancy to design and implement failure-free clouds
Cloud computing has gained popularity over the years, some organizations are using some form of cloud
computing to enhance their business operations while reducing infrastructure costs and gaining more
agility by deploying applications and making changes to applications easily. Cloud computing systems just
like any other computer system are prone to failure, these failures are due to the distributed and complex
nature of the cloud computing platforms.
Evolution of Monitoring and Prometheus (Dublin 2018)Brian Brazil
This talk looks at the evolution of monitoring over time, the ways in which you can approach monitoring, where Prometheus fit into all this, and how Prometheus itself has grown over time.
Why software performance reduces with time?.pdfMike Brown
Software performance reduces over time for several reasons: 1) additional features add complexity and slow programs down, 2) advanced graphical user interfaces require more system resources, and 3) frequent updates introduce bugs and security vulnerabilities that are resource-intensive to fix. Other factors include algorithms that don't scale well to large data sets, internet connectivity that allows malware to slow systems down, and changes to compilers that may inadvertently reduce previously optimized code performance.
This chapter discusses operating system structure and design. It examines issues like layered designs, microkernels, and modules. It presents the basic structures of popular OSes. Key points covered include system services, system calls, system programs, and user interfaces like command interpreters and graphical user interfaces. Popular OS examples like UNIX, Windows, and Mac OS are referenced throughout.
this ppt is about the information of factory data collection system.Various techniques are used to collect data from the factory floor. These technique range from clerical methods that requires workers to fill out paper forms that are later compiled, to fully automated methods that require no human participation.
Evolving role of Software,Legacy software,CASE tools,Process Models,CMMInimmik4u
The Evolving role of Software – Software – The changing Nature of Software – Legacy software, Introduction to CASE tools, A generic view of process– A layered Technology – A Process Framework – The Capability Maturity Model Integration (CMMI) – Process Assessment – Personal and Team Process Models. Product and Process. Process Models – The Waterfall Model – Incremental Process Models – Incremental Model – The RAD Model – Evolutionary Process Models – Prototyping – The Spiral Model – The Concurrent Development Model – Specialized Process Models – the Unified Process.
ON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSijgca
Grid computing or computational grid is always a vast research field in academic, as well as in industry also. Computational grid provides resource sharing through multi-institutional virtual organizations for dynamic problem solving. Various heterogeneous resources of different administrative domain are virtually distributed through different network in computational grids. Thus any type of failure can occur at any point of time and job running in grid environment might fail. Hence fault tolerance is an important and challenging issue in grid computing as the dependability of individual grid resources may not be guaranteed. In order to make computational grids more effective and reliable fault tolerant system is necessary. The objective of this paper is to review different existing fault tolerance techniques applicable in grid computing. This paper presents state of the art of various fault tolerance technique and comparative study of the existing algorithms.
Program aging is a degradation of performance or functionality caused by resource depletion. The aging affects the cloud
services which provide access to big data bank and computing resources. This suffers large budget and delays of defect removal, which
requires other related solutions including renewal in the form of controlled restarts. Collection of various runtime metrics are more
significant source for further study of detection and analysis of aging issues. This study highlights the method for detecting aging
immediately after their introduction by runtime comparisons of different development scenarios. The study focuses on aging of
program and service crash as a consequence.
A Study Of Real-Time Embedded Software Systems And Real-Time Operating SystemsRick Vogel
This document summarizes a seminar report on real-time embedded software systems and real-time operating systems. It discusses what embedded systems and real-time systems are, and describes some of the key components and requirements of real-time operating systems including multi-tasking, memory management, task scheduling, and case studies of several popular RTOSs. The report aims to provide an overview of the technologies behind embedded systems design and survey available real-time operating systems.
The document discusses a framework for a self-healing module (SHM) to automate response to failures in a virtual manufacturing execution system (vMES). The SHM would detect failures, determine resolutions, and enact resolutions without human intervention. This would improve productivity by automating error recovery. The SHM framework uses event listeners, triggers, and actions. Listeners detect events, triggers determine responses, and actions enact those responses, such as restarting processes, migrating virtual machines, or adjusting database settings. The goal is to automate operations and improve response time to failures in virtual manufacturing environments.
Similar to What do you do when you’ve caught an exception? (20)
Chatbots are growing in popularity as developers face the
limitations of the mobile app. User interfaces that simulate a human
conversation, the history of chatbots goes back to the late 18th
century. I'll take you on a tour of that history with an eye on finding
insights on what is possible today and in the near future with chatbots.
Issues Covered: Amazon Alexa, Facebook Messenger Chatbots, Alan
Turing, and much more.
Estimating the Software Product Value during the Development ProcessPaul Houle
Nowadays software companies are facing a fierce competition
to deliver better products but offering a higher value to the customer.
In this context, software product value has becoming a major concern
in software industry, leading for improving the knowledge and better
understanding about how to estimate the software value in early development
phases. Other way, software companies encounter problems such
as releasing products that were developed with high expectations, but
they gradually fall into the category of a mediocre product when they
are released to the market. These high expectations are tightly related
to the expected and offered software value to the customer. This paper
presents an approach for estimating the software product value, focusing
on the development phases. We propose a value indicators approach to
quantify the real value of the development products. The aim is early
identifying potential deviations in the real software value, by comparing
the estimated versus the expected. We present an internal validation
to show the feasibility of this approach to produce benefits in industry
projects.
Universal Standards for LEI and other Corporate Reference Data: Enabling risk...Paul Houle
Individual Records are supposed be updated yearly, but more than 25% of records have lapsed
LEI records must be updated sooner in response to corporate actions if the GLEIS is to be useful as golden copy reference data inside financial institutions
Needed: real-time, streaming analytics to monitor SIFI, trading and the buildup of systemic risk
LOUs publish records daily, which are concatenated by the LOUS: however, the data on organization types (Corporation, Fund, Trust, etc.), Geographic Regions, Business Registry Numbers, Hierarchy and other fields are often missing or inconsistently represented
Implementing Reference Dataand Corporate Actions in a Blockchain
Fixing a leaky bucket; Observations on the Global LEI SystemPaul Houle
We apply bitemporal analysis to more than 500 daily data files supplied by the Global Legal Entity Identifier Foundation to show that churn dominates the dynamics of growth, such that the number of paid up records in full standing has been flat over the last year. Our analysis reveals occasional daily glitches in the past that affected thousands of records. These appear to be in better control now, but a high lapse rate is the primary challenge to the Global LEI System right now
Although animals do not use language, they are capable of many of the same kinds of cognition as us; much of our experience is at a non-verbal level.
Semantics is the bridge between surface forms used in language and what we do and experience.
Language understanding depends on world knowledge (i.e. “the pig is in the pen” vs. “the ink is in the pen”)
We might not be ready for executives to specify policies themselves, but we can make the process from specification to behavior more automated, linked to precise vocabulary, and more traceable.
Advances such as SVBR and an English serialization for ISO Common Logic means that executives and line workers can understand why the system does certain things, or verify that policies and regulations are implemented
The Ontology2 platform squares the circle between big data, low latency, and semantics by combining expressive reasoning, information retrieval and machine learning with Hadoop, Apache Spark and Solid State Drive backed cloud .
Types in Freebase typically mean that something plays a role. For instance, superman is a :film.film_subject because he the subject of a film. He is an :amusement_parks.ride_theme because amusement parks have been made about him. There's nothing contradictory about this, at least to first order, because these types don't fit into a hierarchy.
Today's Enterprise Search products have effective answers for content ingestion and and query performance.
Any product that is successful at all has an answer for content ingestion. It's a complex problem because you need to interact with many kinds of system, but it's a solved problem: a vendor who hasn't solved this problem would not be successful at all.
This graph compares two importance scores for DBpedia concepts.
x-axis (left) is a usage-based importance score (based on which pages people view) and y-axis (up) is the PageRank score published by ath representing the strength of links to a concept. Correlation coefficients via kendal, spearman and pearson are in the 0.35-0.5 range.
Be warned that this system does not have a sense of "respectability" so you be offended by many of its opinions and we apologize for it.
Extension methods, nulls, namespaces and precedence in c#Paul Houle
Extension methods are the most controversial feature that Microsoft has introduced in
C# 3.0. Introduced to support the LINQ query framework, extension methods make
it possible to define new methods for existing classes.
Although extension methods can greatly simplify code that uses them, many are
concerned that they could transform C# into something that programmers find
unrecognizable, or that C#’s namespace mechanisms are inadequate for managing
large systems that use extension methods. Adoption of the LINQ framework,
however, means that extension methods are here to stay, and that .net
programmers need to understand how to use them effectively, and, in particular,
how extension methods are different from regular methods.
This article discusses three ways in which extension methods differ from regular
methods:
1. Extension methods can be called on null objects without throwing an exception
2. Extension methods cannot be called inside of a subclass
Dropping unique constraints in sql serverPaul Houle
I got started with relational databases with mysql, so I’m in the habit of making
database changes with SQL scripts, rather than using a GUI. Microsoft SQL Server
requires that we specify the name of a unique constraint when we want to drop it. If
you’re thinking ahead, you can specify a name when you create the constraint; if
you don’t, SQL Server will make up an unpredictable name, so you can’t write a
simple script to drop the constraint.
I'm an expert on building commerical large scale systems based on Linked Data sources such as Freebase and DBpedia. I'm the creator of :BaseKB, which was the first correct conversion of Freebase to RDF and of Infovore, the open source
I do consulting on the following areas:
* Data processing with Hadoop and the design and construction of systems using Amazon Web Services
* Architecture and construction of systems that consume and produce Linked Data
* Construction and evaluation of intelligent systems that make subjective decisions (text search, text classification, machine learning, etc.)
I'm not at all interested in doing maintenance work on other people's code, but I am interested in helping you align your process, structure, and tools to speed up your development cycle, improve your products, and prevent developer burnout. I am not free to relocate at this time, but I collaborate all of the time with workers around the world and I can travel to your location, understand your needs and transfer skills to your workforce.
Mat Byrne recently posted source code for a dynamic domain object in PHP which
takes advantage of the dynamic nature of PHP. It’s a good example of how
programmers can take advantage of the unique characteristics of a programming
language.
Statically typed languages such as C# and Java have some advantages: they run
faster and IDE’s can understand the code enough to save typing (with your fingers),
help you refactor your code, and help you fix errors. Although there’s a lot of things I
like symfony, it feels like a Java framework that’s invaded the PHP world. Eclipse
would help you deal with the endless getters and setters and domain object methods
with 40-character names in Java, Eclipse.
The limits of polymorphism are a serious weakness of today’s statically typed
languages. C# and Java apps that I work with are filled with if-then-else or case
ladders when they need to initialize a dynamically chosen instance of one of a set of
classes that subclass a particular base class or that implement a particular interface.
Sure, you can make a HashMap or Dictionary that’s filled with Factory objects, but
any answer for that is cumbersome. I
articles on asynchronous communications with a focus on Javascript.
Minimizing Code Paths In Asynchronous Code, a recent post of his, is about a lesson
that I learned the hard way with GWT that applies to all RIA systems that use
asynchronous calls. His example is the same case I encountered, where a function
might return a value from a cache or might query the server to get the value: an
obvious way to do this in psuedocode is:
function getData(...arguments...,callback) {
if (... data in cache...) {
callback(...cached data...);
}
cacheCallback=anonymousFunction(...return value...) {
... store value in cache...
callback(...cached data...);
}
getDataFromServer(...arguments...,cacheCallback)
}
At first glance this code looks innocuous, but there’s a major difference between
what happens in the cached and uncached case. In the cached case, the callback()
function gets called before getData() returns — in the uncached case, the opposite
happens. What happens in this function has a global impact on the execution of the
program, opening up two code paths that complicate concurrency control and
introduce bugs that can be frustrating to debug.
This function can be made more reliable if it schedules callback() to run after the
thread it is running in completes. In Javascript, this can be done with setTimeout().
In Silverlight use System.Windows.Threading.Dispatcher. to schedule the callback
ProAlign Web is sold on a “Software as a Service” basis, which eliminates the burden and costs of software installation, maintenance, and any server requirements. Upgrades happen seamlessly without your involvement.
Users have permission levels for manipulating and viewing data — the system will show only those geographic areas authorized for each user.
It's easy to create, share, modify and approve alignments online — visualization tools allow immediate understanding of key account metrics by territory
Built in communication — allows integrated notes, requests and approval processes.
Every installation of ProAlign Web is fully integrated with and includes ProAlign Desktop. Built on ESRI ArcGIS ® mapping technology, the world’s number one mapping software. ProAlign Desktop offers an additional robust portfolio of analytical, database and mapping features for the sales manager or sales operations analyst.
Companies with large, decentralized sales forces, multiple layers of sales management, and a collaborative approach to managing sales territories are a perfect fit for ProAlign Web sales territory alignment software.
You can cooperatively create, modify and share territory alignments online, gaining input from sales managers anywhere in your organization who have the local account and customer knowledge critical to designing optimal territories.
Sometimes getting field and local sales personnel involved can be a logistics challenge and lead to delays in completing new territory assignments. Not anymore. ProAlign Web provides secure online access for authorized sales management to view and propose changes to territory alignments. You’ll save weeks on complex sales territory projects.
Text wise technology textwise company, llcPaul Houle
TextWise has recently developed Semantic Gist® to provide intuitive semantic modeling on a large number of samples, particularly vertical text documents that often do not have classification schemes associated with them. These semantic models will automatically adapt to rapidly changing content, ensuring a high level of accuracy over time.
Semantic Gist® represents a significant advance in the use of machine learning, image and speech characterization, and neural networks to attack unsupervised semantic modeling. Our patent-pending approach generates a compact representation of any text by using advanced statistical language models to identify the significant features of a document.
An auto-encoder neural network encodes the features into a low-dimensionality semantic representation, and then reconstructs an approximation of the original feature vector from the semantic representation. The software highlights keywords that may be underrepresented by the semantic representation and encodes these separately as a complementary feature vector.
Finally, the complementary feature vector is combined with the semantic representation to produce a Semantic Gist® that can be easily used for document indexing, matching and other applications.
User management is essential for community and interactive web systems. User management is often an afterthought in a project; although it is critical to the usability (a user that can't register log in can't use the service) and managability of a site, developers often build ad hoc solutions that are unreliable, insecure and hard to use. Tapir User Manager is a user management system that's been used to support large traffic sites. This presentation introduces TUM and explains how it's been used for projects at CUL and elsewhere.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
What do you do when you’ve caught an exception?
1. Generation 5 » What do you do when you’ve caught an exception?
Subscribe to our RSS Feed | About Us
What do you do when you’ve caught an exception?
Abort, Retry, Ignore
This article is a follow up to “Don’t Catch Exceptions“, which advocates that
exceptions should (in general) be passed up to a “unit of work”, that is, a fairly
coarse-grained activity which can reasonably be failed, retried or ignored. A unit of
work could be:
an entire program, for a command-line script,
a single web request in a web application,
the delivery of an e-mail message
the handling of a single input record in a batch loading application,
rendering a single frame in a media player or a video game, or
an event handler in a GUI program
The code around the unit of work may look something like
[01] try {
[02]
DoUnitOfWork()
[03] } catch(Exception e) {
[04]
... examine exception and decide what to do ...
[05] }
For the most part, the code inside DoUnitOfWork() and the functions it calls tries to
throw exceptions upward rather than catch them.
To handle errors correctly, you need to answer a few questions, such as
Was this error caused by a corrupted application state?
Did this error cause the application state to be corrupted?
Was this error caused by invalid input?
What do we tell the user, the developers and the system administrator?
Could this operation succeed if it was retried?
Is there something else we could do?
Although it’s good to depend on existing exception hierarchies (at least you won’t
introduce new problems), the way that exceptions are defined and thrown inside the
work unit should help the code on line [04] make a decision about what to do — such
practices are the subject of a future article, which subscribers to our RSS feed will be
the first to read.
The cause and effect of errors
There are a certain range of error conditions that are predictable, where it’s possible
to detect the error and implement the correct response. As an application becomes
more complex, the number of possible errors explodes, and it becomes impossible or
unacceptably expensive to implement explicit handling of every condition.
What do do about unanticipated errors is a controversial topic. Two extreme positions
are: (i) an unexpected error could be a sign that the application is corrupted, so that
the application should be shut down, and (ii) systems should bend but not break: we
should be optimistic and hope for the best. Ultimately, there’s a contradiction
between integrity and availability, and different systems make different choices. The
ecosystem around Microsoft Windows, where people predominantly develop desktop
applications, is inclined to give up the ghost when things go wrong — better to show
a “blue screen of death” than to let the unpredictable happen. In the Unix
ecosystem, more centered around server applications and custom scripts, the
tendency is to soldier on in the face of adversity.
What’s at stake?
Desktop applications tend to fail when unexpected errors happen: users learn to save
http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
Search for:
Search
Archives
June 2012 (1)
August 2010 (1)
May 2010 (1)
June 2009 (2)
April 2009 (1)
March 2009 (1)
February 2009 (3)
January 2009 (3)
November 2008 (1)
August 2008 (2)
July 2008 (5)
June 2008 (5)
May 2008 (2)
April 2008 (6)
March 2008 (8)
June 2006 (1)
February 2006 (1)
Categories
AJAX (2)
Asynchronous Communications (16)
Biology (1)
Books (1)
Design (1)
Distributed (1)
Exceptions (2)
Functional Programming (1)
GIS (1)
Ithaca (1)
Japan (1)
Math (1)
Media (3)
Nature (1)
Semantic Web (3)
Tools (28)
CRUD (1)
Dot Net (17)
Freebase (2)
GWT (9)
Java (7)
Linq (2)
PHP (6)
Server Frameworks (1)
Silverlight (12)
SQL (5)
Uncategorized (1)
Web (2)
Analytics (1)
2. Generation 5 » What do you do when you’ve caught an exception?
frequently. Some of the best applications, such as GNU emacs and Microsoft Word,
keep a running log of changes to minimize work lost to application and system
crashes. Users accept the situation.
On the other hand, it’s unreasonable for a server application that serves hundreds or
millions of users to shut down on account of a cosmic ray. Embedded systems, in
particular, function in a world where failure is frequent and the effects must be
minimized. As we’ll see later, it would be a real bummer if the Engine Control Unit in
your car left you stranded home because your oxygen sensor quit working.
The following diagram illustrates the environment of a work unit in a typical
application: (although this application accesses network resources, we’re not thinking
of it as a distributed application. We’re responsible for the correct behavior of the
application running in a single address space, not about the correct behavior of a
process swarm.)
The Input to the work unit is a potential source of trouble. The input could be
invalid, or it could trigger a bug in the work unit or elsewhere in the system (the
“system” encompasses everything in the diagram) Even if the input is valid, it could
contain a reference to a corrupted resource, elsewhere in the system. A corrupted
resource could be a damaged data structure (such as a colored box in a database),
or an otherwise malfunctioning part of the system (a crashed server or router on the
network.)
Data structures in the work unit itself are the least problematic, for purposes of error
handling, because they don’t outlive the work unit and don’t have any impact on
future work units.
Static application data, on the other hand, persists after the work unit ends, and
this has two possible consequences:
1. The current work unit can fail because a previous work unit caused a resource to
be corrupted, and
2. The current work unit can corrupt a resource, causing a future work unit to fail
Osterman’s argument that applications should crash on errors is based on this reality:
an unanticipated failure is a sign that the application is in an unknown (and possibly
bad) state, and can’t be trusted to be reliable in the future. Stopping the application
and restarting it clears out the static state, eliminating resource corruption.
Rebooting the application, however, might not free up corrupted resources inside the
operating system. Both desktop and server applications suffer from operating system
errors from time to time, and often can get immediate relief by rebooting the whole
computer.
The “reboot” strategy runs out of steam when we cross the line from in-RAM state to
persistent state, state that’s stored on disks, or stored elsewhere on the network.
http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
3. Generation 5 » What do you do when you’ve caught an exception?
Once resources in the persistent world are corrupted, they need to be (i) lived with,
or repaired by (ii) manual or (iii) automatic action.
In either world, a corrupted resource can have either a narrow (blue) or wide
(orange) effect on the application. For instance, the user account record of an
individual user could be damaged, which prevents that user from logging in. That’s
bad, but it would hardly be catastrophic for a system that has 100,000 users. It’s
best to ‘ignore’ this error, because a system-wide ‘abort’ would deny service to
99,999 other users; the problem can be corrected when the user complains, or when
the problem is otherwise detected by the system administrator.
If, on the other hand, the cryptographic signing key that controls the authentication
process were lost, nobody would be able to log in: that’s quite a problem. It’s kind
of the problem that will be noticed, however, so aborting at the work unit level
(authenticated request) is enough to protect the integrity of the system while the
administrators repair the problem.
Problems can happen at an intermediate scope as well. For instance, if the system
has damage to a message file for Italian users, people who use the system in the
Italian language could be locked out. If Italian speakers are 10% of the users, it’s
best to keep the system running for others while you correct the problem.
Repair
There are several tools for dealing with corruption in persistent data stores. In a oneof-a-kind business system, a DBA may need to intervene occasionally to repair
corruption. More common events can be handled by running scripts which detect and
repair corruption, much like the fsck command in Unix or the chkdsk command in
Windows. Corruption in the metadata of a filesystem can, potentially, cause a
sequence of events which leads to massive data loss, so UNIX systems have
historically run the fsck command on filesystems whenever the filesystem is in a
questionable state (such as after a system crash or power failure.) The time do do an
fsck has become an increasing burden as disks have gotten larger, so modern UNIX
systems use journaling filesystems that protect filesystem metadata with transactional
semantics.
Release and Rollback
One role of an exception handler for a unit of work is to take steps to prevent
corruption. This involves the release of resources, putting data in a safe state, and,
when possible, the rollback of transactions.
Although many kinds of persistent store support transactions, and many in-memory
data structures can support transactions, the most common transactional store that
people use is the relational database. Although transactions don’t protect the database
from all programming errors, they can ensure that neither expected or unexpected
exceptions will cause partially-completed work to remain in the database.
A classic example in pseudo code is the following:
[06] function TransferMoney(fromAccount,toAccount,amount) {
[07]
try {
[08]
BeginTransaction();
[09]
ChangeBalance(toAccount,amount);
[10]
... something throws exception here ...
[11]
ChangeBalance(fromAccount,-amount);
[12]
CommitTransaction();
[13]
} catch(Exception e) {
[14]
RollbackTransaction();
[15]
}
[16] }
In this (simplified) example, we’re transferring money from one bank account to
another. Potentially an exception thrown at line [05] could be serious, since it would
cause money to appear in toAccount without it being removed from fromAccount . It’s
bad enough if this happens by accident, but a clever cracker who finds a way to
cause an exception at line [05] has discovered a way to steal money from the bank.
Fortunately we’re doing this financial transaction inside a database transaction.
Everything done after BeginTransaction() is provisional: it doesn’t actually appear in
the database until CommitTransaction() is called. When an exception happens, we call
RollbackTransaction(), which makes it as if the first ChangeBalance() had never been
called.
As mentioned in the “Don’t Catch Exceptions” article, it often makes sense to do
release, rollback and repairing operations in a finally clause rather than the unit-ofwork catch clause because it lets an individual subsystem take care of itself — this
http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
4. Generation 5 » What do you do when you’ve caught an exception?
promotes encapsulation. However, in applications that use databases transactionally, it
often makes sense to push transaction management out the the work unit.
Why? Complex database operations are often composed out of simpler database
operations that, themselves, should be done transactionally. To take an example,
imagine that somebody is opening a new account and funding it from an existing
account:
[17] function OpenAndFundNewAccount(accountInformation,oldAccount,amount) {
[18]
if (amount<MinimumAmount) {
[19]
throw new InvalidInputException(
[20]
"Attempted To Create Account With Balance Below Minimum"
[21]
);
[22]
}
[23]
newAccount=CreateNewAccountRecords(accountInformation);
[24]
TransferMoney(oldAccount,newAccount,amount);|
[25] }
It’s important that the TransferMoney operation be done transactionally, but it’s also
important that the whole OpenAndFundNewAccount operation be done transactionally
too, because we don’t want an account in the system to start with a zero balance.
A straightforward answer to this problem is to always do banking operations inside a
unit of work, and to begin, commit and roll back transactions at the work unit level:
[26] AtmOutput ProcessAtmRequest(AtmInput in) {
[27]
try {
[28]
BeginTransaction();
[29]
BankingOperation op=AtmInput.ParseOperation();
[30]
var out=op.Execute();
[31]
var atmOut=AtmOutput.Encode(out);
[32]
CommitTransaction();
[33]
return atmOut;
[34]
}
[35]
catch(Exception e) {
[36]
RollbackTransaction();
[37]
... Complete Error Handling ...
[38]
}
In this case, there might be a large number of functions that are used to manipulate
the database internally, but these are only accessable to customers and bank tellers
through a limited set of BankingOperations that are always executed in a transaction.
Notification
There are several parties that could be notified when something goes wrong with an
application, most commonly:
1. the end user,
2. the system administrator, and
3. the developers.
Sometimes, as in the case of a public-facing web application, #2 and #3 may overlap.
In desktop applications, #2 might not exist.
Let’s consider the end user first. The end user really needs to know (i) that something
went wrong, and (ii) what they can do about it. Often errors are caused by user input:
hopefully these errors are expected, so the system can tell the user specifically what
went wrong: for instance,
[39] try {
[40]
... process form information ...
[41]
[42]
if (!IsWellFormedSSN(ssn))
[43]
throw new InvalidInputException("You must supply a valid social
security number");
[44]
[45]
... process form some more ...
[46] } catch(InvalidInputException e) {
[47]
DisplayError(e.Message);
[48] }
other times, errors happen that are unexpected. Consider a common (and bad)
practice that we see in database applications: programs that write queries without
correctly escaping strings:
[49] dbConn.Execute("
[50]
INSERT INTO people (first_name,last_name)
[51]
VALUES ('"+firstName+"','+lastName+"');
[52] ");
this code is straightforward, but dangerous, because a single quote in the firstName or
lastName variable ends the string literal in the VALUES clause, and enables an SQL
injection attack. (I’d hope that you know better than than to do this, but large
projects worked on by large teams inevitably have problems of this order.) This code
http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
5. Generation 5 » What do you do when you’ve caught an exception?
might even hold up well in testing, failing only in production when a person registers
with
[53] lastName="O'Reilly";
Now, the dbConn is going to throw something like a SqlException with the following
message:
[54] SqlException.Message="Invalid SQL Statement:
[55]
INSERT INTO people (first_name,last_name)
[56]
VALUES ('Baba','O'Reilly');"
we could show that message to the end user, but that message is worthless to most
people. Worse than that, it’s harmful if the end user is a cracker who could take
advantage of the error — it tells them the name of the affected table, the names of
the columns, and the exact SQL code that they can inject something into. You might
be better off showing users something like:
and telling them that they’ve experienced an “Internal Server Error.” Even so, the
discovery that a single quote can cause an “Internal Server Error” can be enough for
a good cracker to sniff out the fault and develop an attack in the blind.. What can we
do? Warn the system administrators. The error handling system for a server
application should log exceptions, stack trace and all. It doesn’t matter if you use the
UNIX syslog mechanism, the logging service in Windows NT, or something that’s built
into your server, like Apache’s error_log . Although logging systems are built into both
Java and .Net, many developers find that Log4J and Log4N are especially effective.
There really are two ways to use logs:
1. Detailed logging information is useful for debugging problems after the fact. For
instance, if a user reports a problem, you can look in the logs to understand the
origin of the problem, making it easy to debug problems that occur rarely: this
can save hours of time trying to understand the exact problem a user is
experiencing.
2. A second approach to logs is proactive: to regularly look a logs to detect
problems before they get reported. In the example above, the SqlException
would probably first be thrown by an innocent person who has an apostrophe in
his or her name — if the error was detected that day and quickly fixed, a
potential security hole could be fixed long before it would be exploited.
Organizaitons that investigate all exceptions thrown by production web
applications run the most secure and reliable applications.
In the last decade it’s become quite common for desktop applications to send stack
traces back to the developers after a crash: usually they pop up a dialog box that
asks for permission first. Although developers of desktop applications can’t be as
proactive as maintainers of server applications, this is a useful tool for discovering
errors that escape testing, and to discover how commonly they occur in the field.
Retry I: Do it again!
Some errors are transient: that is, if you try to do the same operation later, the
operation may succeed. Here are a few common cases:
An attempt to write to a DVD-R could fail because the disk is missing from the
drive
A database transaction could fail when you commit it because of a conflict with
http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
6. Generation 5 » What do you do when you’ve caught an exception?
another transaction: an attempt to do the transaction again could succeed
An attempt to deliver a mail message could fail because of problems with the
network or destination mail server
A web crawler that crawls thousands (or millions) of sites will find that many of
them are down at any given time: it needs to deal with this reasonably, rather
than drop your site from it’s index because it happened to be down for a few
hours
Transient errors are commonly associated with the internet and with remote servers;
errors are frequent because of the complexity of the internet, but they’re transitory
because problems are repaired by both automatic and human intervention. For
instance, if a hardware failure causes a remote web or email server to go down, it’s
likely that somebody is going to notice the problem and fix it in a few hours or days.
One strategy for dealing with transient errors is to punt it back to the user: in a case
like this, we display an error message that tells the user that the problem might clear
up if they retry the operation. This is implicit in how web browsers work: sometimes
you try to visit a web page, you get an error message, then you hit reload and it’s all
OK. This strategy is particularly effective when the user could be aware that there’s a
problem with their internet connection and could do something about it: for instance,
they might discover that they’ve moved their laptop out of Wi-Fi range, or that the
DSL connection at their house has gone down for the weekend.
SMTP, the internet protocol for email, is one of the best examples of automated retry.
Compliant e-mail servers store outgoing mail in a queue: if an attempt to send mail to
a destination server fails, mail will stay in the queue for several days before reporting
failure to the user. Section 4.5.4 of RFC 2821 states:
The sender MUST delay retrying a particular destination after one
attempt has failed. In general, the retry interval SHOULD be at
least 30 minutes; however, more sophisticated and variable strategies
will be beneficial when the SMTP client can determine the reason for
non-delivery.
Retries continue until the message is transmitted or the sender gives
up; the give-up time generally needs to be at least 4-5 days. The
parameters to the retry algorithm MUST be configurable.
A client SHOULD keep a list of hosts it cannot reach and
corresponding connection timeouts, rather than just retrying queued
mail items.
Experience suggests that failures are typically transient (the target
system or its connection has crashed), favoring a policy of two
connection attempts in the first hour the message is in the queue,
and then backing off to one every two or three hours.
Practical mail servers use fsync() and other mechanisms to implement transactional
semantics on the queue: the needs of reliability make it expensive to run an SMTPcompliant server, so e-mail spammers often use non-compliant servers that don’t
correctly retry (if they’re going to send you 20 copies of the message anyway, who
cares if only 15 get through?) Greylisting is a highly effective filtering strategy that
tests the compliance of SMTP senders by forcing a retry.
Retry II: If first you don’t succeed…
An alternate form of retry is to try something different. For instance, many programs
in the UNIX environment will look in many different places for a configuration file: if
the file isn’t in the first place tried, it will try the second place and so forth.
The online e-print server at arXiv.org has a system called AutoTex which automatically
converts documents written in several dialects of TeX and LaTeX into Postscript and
PDF files. AutoTex unpacks the files in a submission into a directory and uses chroot
to run the document processing tools in a protected sandbox. It tries about of ten
different configurations until it finds one that successfully compiles the document.
In embedded applications, where availability is important, it’s common to fall back to
a “safe mode” when normal operation is impossible. The Engine Control Unit in a
modern car is a good example:
http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
7. Generation 5 » What do you do when you’ve caught an exception?
Since the 1970′s, regulations in the United States have reduced emissions of
hydrocarbons and nitrogen oxides from passenger automobiles by more than a
hundred fold. The technology has many aspects, but the core of the system in an
Engine Control Unit that uses a collection of sensors to monitor the state of the engine
and uses this information to adjust engine parameters (such as the quantity of fuel
injected) to balance performance and fuel economy with environmental compliance.
As the condition of the engine, driving conditions and composition of fuel change over
the time, the ECU normally operates in a “closed-loop” mode that continually
optimizes performance. When part of the system fails (for instance, the oxygen
sensor) the ECU switches to an “open-loop” mode. Rather than leaving you
stranded, it lights the “check engine” indicator and operates the engine with
conservative assumptions that will get you home and to a repair shop.
Ignore?
One strength of exceptions, compared to the older return-value method of error
handling is that the default behavior of an exception is to abort, not to ignore. In
general, that’s good, but there are a few cases where “ignore” is the best option.
Ignoring an error makes sense when:
1. Security is not at stake, and
2. there’s no alternative action available, and
3. the consequences of an abort are worse than the consequences of avoiding an
error
The first rule is important, because crackers will take advantage of system faults to
attack a system. Imagine, for instance, a “smart card” chip embedded in a payment
card. People have successfully extracted information from smart cards by fault
injection: this could be anything from a power dropout to a bright flash of light on an
exposed silicon surface. If you’re concerned that a system will be abused, it’s
probably best to shut down when abnormal conditions are detected.
On the other hand, some operations are vestigial to an application. Imagine, for
instance, a dialog box that pops when an application crashes that offers the user the
choice of sending a stack trace to the vendor. If the attempt to send the stack trace
fails, it’s best to ignore the failure — there’s no point in subjecting the user to an
endless series of dialog boxes.
“Ignoring” often makes sense in the applications that matter the most and those that
matter the least.
For instance, media players and video games operate in a hostile environment where
disks, the network, sound and controller hardware are uncooperative. The “unit of
work” could be the rendering of an individual frame: it’s appropriate for entertainment
devices to soldier on despite hardware defects, unplugged game controllers, network
dropouts and corrupted inputs, since the consequences of failure are no worse than
shutting the system down.
In the opposite case, high-value systems and high-risk should continue functioning
no matter what happen. The software for a space probe, for instance, should never
give up. Much like an automotive ECU, space probes default to a “safe mode” when
contact with the earth is lost: frequently this strategy involves one or more reboots,
but the goal is to always regain contact with controllers so that the mission has a
chance at success.
Conclusion
http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
8. Generation 5 » What do you do when you’ve caught an exception?
It’s most practical to catch exceptions at the boundaries of relatively coarse “units of
work.” Although the handling of errors usually involves some amount of rollback
(restoring system state) and notification of affected people, the ultimate choices are
still what they were in the days of DOS: abort, retry, or ignore.
Correct handling of an error requires some thought about the cause of an error: was it
caused by bad input, corrupted application state, or a transient network failure? It’s
also important to understand the impact the error has on the application state and to
try to reduce it using mechanisms such as database transactions.
“Abort” is a logical choice when an error is likely to have caused corruption of the
application state, or if an error was probably caused by a corrupted state. Applications
that depend on network communications sometimes must “Retry” operations when
they are interrupted by network failures. Another form of “Retry” is to try a different
approach to an operation when the first approach fails. Finally, “Ignore” is appropriate
when “Retry” isn’t available and the cost of “Abort” is worse than soldiering on.
This article is one of a series on error handling. The next article in this series will
describe practices for defining and throwing exceptions that gives exception handlers
good information for making decisions. Subscribers to our RSS Feed will be the first
to read it.
Paul Houle on August 27th 2008 in Dot Net, Exceptions, Java, PHP, SQL
Comments (4)
Comments (4)
Login
Sort by: Date Rating Last Activity
Brandon Edens · 280 weeks ago
0
Change the nature of the game by using a programming language that supports something beyond
primitive exceptions, try/catch/finally, etc...
Try a condition system today:
http://www.gigamonkeys.com/book/beyond-exception-...
Reply
Paul Houle · 280 weeks ago
0
@Brandon,
that's neat stuff. I see some things in that chapter that are right along the lines that I'm thinking.
Could this behavior be easily emulated in a language like C# that supports lambdas and delegates?
Reply
web design company · 280 weeks ago
0
Throw it back
Reply
Generation 5 » Twitter Joins Me
[...] several other bloggers had hotlinked the copy of the twitter fail whale that was in my old “What do
you do if you catch an exception?” post. It turns out that my copy of the whale currently ranks #1 in
Google Image Search. [...]
Post a new comment
Enter text right here!
Comment as a Guest, or login:
Name
Email
Website (optional)
http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]