Adopting classic redundancy-based fault-tolerant schemes in
highly dynamic distributed computing systems does not
necessarily result in the anticipated improvement in
dependability. This primarily stems from statically predefined
redundancy configurations employed within many classic
dependability strategies, which as well known may negatively
impact the schemes' overall effectiveness. In this paper, a
novel dependability strategy is introduced encompassing
advanced redundancy management, aiming to autonomously
tune its internal configuration in function of disturbances
observed. Policies for parsimonious resource allocation are
presented thereafter, intent upon increasing the scheme's cost
effectiveness without breaching its availability objective. Our
experimentation suggests that the suggested solution can
achieve a substantial improvement in availability, compared
to traditional, static redundancy strategies, and that tuning the
adopted degree of redundancy to the actual observed
disturbances allows unnecessary resource expenditure to be
reduced, therefore enhancing cost-effectiveness.
Industrial perspective on static analysisChirag Thumar
ย
by BA Wichmann, AA. Canning, D.L. Clutterbuck, LA Winsborrow,
N.J. Ward and D.W.R. Marsh
Static analysis within industrial applications
provides a means of gaining higher assurance
for critical software. This survey notes several
problems, such as the lack of adequate
standards, difficulty in assessing benefits,
validation of the model used and acceptance
by regulatory bodies. It concludes by outlining
potential solutions and future directions.
Industrial perspective on static analysisChirag Thumar
ย
by BA Wichmann, AA. Canning, D.L. Clutterbuck, LA Winsborrow,
N.J. Ward and D.W.R. Marsh
Static analysis within industrial applications
provides a means of gaining higher assurance
for critical software. This survey notes several
problems, such as the lack of adequate
standards, difficulty in assessing benefits,
validation of the model used and acceptance
by regulatory bodies. It concludes by outlining
potential solutions and future directions.
A Novel Approach for Efficient Resource Utilization and Trustworthy Web ServiceCSCJournals
ย
Many Web services are expected to run with high degree of security and dependability. To achieve this goal, it is essential to use a Web-services compatible framework that tolerates not only crash faults, but Byzantine faults as well, due to the untrusted communication environment in which the Web services operate. In this paper, we describe the design and implementation of such a framework, called RET-WS (Resource Efficient and Trustworthy Execution -Web Service).RET-WS is designed to operate on top of the standard SOAP messaging framework for maximum interoperability with resource efficient way to execute requests in Byzantine-fault-tolerant replication that is particularly well suited for services in which request processing is resource-intensive. Previous efforts took a failure masking all-active approach of using all execution replicas to execute all requests; at least 2t + 1 execution replicas are needed to mask t Byzantine-faulty ones. We describe an asynchronous protocol that provides resource-efficient execution by combining failure masking with imperfect failure detection and checkpointing. It is implemented as a pluggable module within the Axis2 architecture, as such, it requires minimum changes to the Web applications. The core fault tolerance mechanisms used in RET-WS are based on the well-known Castro and Liskov's BFT algorithm for optimal efficiency with some modification for resource efficient way. Our performance measurements confirm that RET-WS incurs only moderate runtime overhead considering the complexity of the mechanisms.
System reliability is an important issue in designing modern multiprocessor systems. This paper proposes a fault-tolerant, scalable, multiprocessor system architecture that adopts a pipeline scheme. To verify the performance of the proposed system, the SimEvent/Stateflow tool of the MATLAB program was used to simulate the system. The proposed system uses twelve processors (P), connected in a linear array, to build a ten-stage system with two backup processors (BP). However, the system can be expanded by adding more processors to increase pipeline stages and performance, and more backup processors to increase system reliability. The system can automatically reorganize itself in the event of a failure of one or two processors and execution continues without interruption. Each processor communicates with its neighboring processors through input/output (I/O) ports which are used as bypass links between the processors. In the event of a processor failure, the function of the faulty processor is assigned to the next processor that is free from faults. The fast Fourier transform (FFT) algorithm is implemented on the simulated circuit to evaluate the performance of the proposed system. The results showed that the system can continue to execute even if one or two processors fail without a noticeable decrease in performance.
This paper advances the Domain Segmentation based on Uncertainty in the Surrogate (DSUS) framework which is a novel approach to characterize the uncertainty in surrogates. The leave-one-out cross-validation technique is adopted in the DSUS framework to measure local errors of a surrogate. A method is proposed in this paper to evaluate the performance of the leave-out-out cross-validation errors as local error measures. This method evaluates local errors by comparing: (i) the leave-one-out cross-validation error with (ii) the actual local error estimated within a local hypercube for each training point. The comparison results show that the leave-one-out cross-validation strategy can capture the local errors of a surrogate. The DSUS framework is then applied to key aspects of wind resource as- sessment and wind farm cost modeling. The uncertainties in the wind farm cost and the wind power potential are successfully characterized, which provides designers/users more confidence when using these models
An Efficient Approach Towards Mitigating Soft Errors Riskssipij
ย
Smaller feature size, higher clock frequency and lower power consumption are of core concerns of todayโs nano-technology, which has been resulted by continuous downscaling of CMOS technologies. The resultantโdevice shrinkingโ reduces the soft error tolerance of the VLSI circuits, as very little energy is needed to change their states. Safety critical systems are very sensitive to soft errors. A bit flip due to soft error can change the value of critical variable and consequently the system control flow can completely be changed which leads to system failure. To minimize soft error risks, a novel methodology is proposed to detect and recover from soft errors considering only โcritical code blocksโ and โcritical variablesโ rather than considering all variables and/or blocks in the whole program. The proposed method shortens space and time overhead in comparison to existing dominant approaches.
Fault tolerance is an important issue in the field of cloud computing which is concerned with the techniques or mechanism needed to enable a system to tolerate the faults that may encounter during its functioning. Fault tolerance policy can be categorized into three categories viz. proactive, reactive and adaptive. Providing a systematic solution the loss can be minimized and guarantee the availability and reliability of the critical services. The purpose and scope of this study is to recommend Support Vector Machine, a supervised machine learning algorithm to proactively monitor the fault so as to increase the availability and reliability by combining the strength of machine learning algorithm with cloud computing.
Efficient failure detection and consensus at extreme-scale systemsIJECEIAES
ย
Distributed systems and extreme-scale systems are ubiquitous in recent years and have seen throughout academia organizations, business, home, and government sectors. Peer-to-peer (P2P) technology is a typical distributed system model that is gaining popularity for delivering computing resources and services. Distributed systems try to increase its availability in the event of frequent component failures and functioning the system in such scenario is notoriously difficult. In order to identify component failures in the system and achieve global agreement (consensus) among failed components, this paper implemented an efficient failure detection and consensus algorithm based on fail-stop type process failures. The proposed algorithm is fault-tolerant to process failures occurring before and during the execution of the algorithm. The proposed algorithm works with the epidemic gossip protocol, which is a randomly generated paradigm of computation and communication that is both fault-tolerant and scalable. A simulation of an extreme-scale information dissemination process shows that global agreement can be achieved. A P2P simulator, PeerSim, is used in the paper to implement and test the proposed algorithm. The proposed algorithm results exhibited high scalability and at the same time detected all the process failures. The status of all the processes is maintained in a Boolean matrix.
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault
tolerance. A distributed system may require taking checkpoints from time to time to keep it free of arbitrary
failures. In case of failure, the system will rollback to checkpoints where global consistency is preserved.
Checkpointing is one of the fault-tolerant techniques to restore faults and to restart job fast. The algorithms
for checkpointing on distributed systems have been under study for years.
It is known that checkpointing and rollback recovery are widely used techniques that allow a distributed
computing to progress inspite of a failure.There are two fundamental approaches for checkpointing and
recovery.One is asynchronus approach, process take their checkpoints independenty.So,taking checkpoints
is very simple but due to absence of a recent consistent global checkpoint which may cause a rollback of
computation.Synchronus checkpointing approach assumes that a single process other than the application
process invokes the checkpointing algorithm periodically to determine a consistent global checkpoint.
Truly dependable software systems should be built with structuring techniques able to decompose the software complexity without
hiding important hypotheses and assumptions such as those regarding
their target execution environment and the expected fault- and system
models. A judicious assessment of what can be made transparent and
what should be translucent is necessary. This paper discusses a practical
example of a structuring technique built with these principles in mind:
Reflective and refractive variables. We show that our technique offers
an acceptable degree of separation of the design concerns, with limited
code intrusion; at the same time, by construction, it separates but does
not hide the complexity required for managing fault-tolerance. In particular, our technique offers access to collected system-wide information
and the knowledge extracted from that information. This can be used
to devise architectures that minimize the hazard of a mismatch between
dependable software and the target execution environments.
High Availability of Services in Wide-Area Shared Computing NetworksMรกrio Almeida
ย
(Check my blog @ http://www.marioalmeida.eu/ )
Highly available distributed systems have been widely used and have proven to be resistant to a wide range of faults. Although these kind of services are easy to access, they require an investment that developers might not always be willing to make. We present an overview of Wide-Area shared computing networks as well as methods to provide high availability of services in such networks. We make some references to highly available systems that are being used and studied at the moment this paper was written (2012).
Fault localization is time-consuming and difficult, which makes it the bottleneck of the
debugging progress. To help facilitate this task, there exist many fault localization techniques
that help narrow down the region of the suspicious code in a program. Better accuracy in fault
localization is achieved from heavy computation cost. Fault localization techniques that can
effectively locate faults also manifest slow response rate. In this paper, we promote the use of
pre-computing to distribute the time-intensive computations to the idle period of coding phase,
in order to speed up such techniques and achieve both low-cost and high accuracy. We raise the
research problems of finding suitable techniques that can be pre-computed and adapt it to the
pre-computing paradigm in a continuous integration environment. Further, we use an existing
fault localization technique to demonstrate our research exploration, and shows visions and
challenges of the related methodologies.
Fault localization is time-consuming and difficult,
which makes it the bottleneck of the
debugging progress. To help facilitate this task, t
here exist many fault localization techniques
that help narrow down the region of the suspicious
code in a program. Better accuracy in fault
localization is achieved from heavy computation cos
t. Fault localization techniques that can
effectively locate faults also manifest slow respon
se rate. In this paper, we promote the use of
pre-computing to distribute the time-intensive comp
utations to the idle period of coding phase,
in order to speed up such techniques and achieve bo
th low-cost and high accuracy. We raise the
research problems of finding suitable techniques th
at can be pre-computed and adapt it to the
pre-computing paradigm in a continuous integration
environment. Further, we use an existing
fault localization technique to demonstrate our res
earch exploration, and shows visions and
challenges of the related methodologies.
Configuration Navigation Analysis Model for Regression Test Case Prioritizationijsrd.com
ย
Regression testing has been receiving increasing attention nowadays. Numerous regression testing strategies have been proposed. Most of them take into account various metrics like cost as well as the ability to find faults quickly thereby saving overall testing time. In this paper, a new model called the Configuration Navigation Analysis Model is proposed which tries to consider all stakeholders and various testing aspects while prioritizing regression test cases.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
ย
Call for paper 2012, hard copy of Certificate, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJCER, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, research and review articles, IJCER Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathematics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer review journal, indexed journal, research and review articles, engineering journal, www.ijceronline.com, research journals,
yahoo journals, bing journals, International Journal of Computational Engineering Research, Google journals, hard copy of Certificate,
journal of engineering, online Submission
In the Fifties, Arnold Schรถnberg introduced a model for music composition that he called "Grundgestalt", the basic shape. In this seminar I show how I interpreted this concept as a generative music model that translates the orbits of dynamic systems into musical components. I also describe a family of experiments that led me to the creation of simple and not-so-simple musical compositions, which I call "my little Grundgestaltenโ. Excerpts from a selection of those compositions will be presented.
Models and Concepts for Socio-technical Complex Systems: Towards Fractal Soci...Vincenzo De Florio
ย
We introduce fractal social organizationsโa novel class of socio-technical complex systems characterized
by a distributed, bio-inspired, hierarchical architecture. Based on a same building block that is recursively
applied at different layers, said systems provide a homogeneous way to model collective behaviors of
different complexity and scale. Key concepts and principles are enunciated by means of a case study and a
simple formalism. As preliminary evidence of the adequacy of the assumptions underlying our systems here
we define and study an algebraic model for a simple class of social organizations. We show how despite its
generic formulation, geometric representations of said model exhibit the spontaneous emergence of complex
hierarchical and modular patterns characterized by structured addition of complexity and fractal natureโ
which closely correspond to the distinctive architectural traits of our fractal social organizations. Some
reflections on the significance of these results and a view to the next steps of our research conclude this
contribution.
More Related Content
Similar to TOWARDS PARSIMONIOUS RESOURCE ALLOCATION IN CONTEXT-AWARE N-VERSION PROGRAMMING
A Novel Approach for Efficient Resource Utilization and Trustworthy Web ServiceCSCJournals
ย
Many Web services are expected to run with high degree of security and dependability. To achieve this goal, it is essential to use a Web-services compatible framework that tolerates not only crash faults, but Byzantine faults as well, due to the untrusted communication environment in which the Web services operate. In this paper, we describe the design and implementation of such a framework, called RET-WS (Resource Efficient and Trustworthy Execution -Web Service).RET-WS is designed to operate on top of the standard SOAP messaging framework for maximum interoperability with resource efficient way to execute requests in Byzantine-fault-tolerant replication that is particularly well suited for services in which request processing is resource-intensive. Previous efforts took a failure masking all-active approach of using all execution replicas to execute all requests; at least 2t + 1 execution replicas are needed to mask t Byzantine-faulty ones. We describe an asynchronous protocol that provides resource-efficient execution by combining failure masking with imperfect failure detection and checkpointing. It is implemented as a pluggable module within the Axis2 architecture, as such, it requires minimum changes to the Web applications. The core fault tolerance mechanisms used in RET-WS are based on the well-known Castro and Liskov's BFT algorithm for optimal efficiency with some modification for resource efficient way. Our performance measurements confirm that RET-WS incurs only moderate runtime overhead considering the complexity of the mechanisms.
System reliability is an important issue in designing modern multiprocessor systems. This paper proposes a fault-tolerant, scalable, multiprocessor system architecture that adopts a pipeline scheme. To verify the performance of the proposed system, the SimEvent/Stateflow tool of the MATLAB program was used to simulate the system. The proposed system uses twelve processors (P), connected in a linear array, to build a ten-stage system with two backup processors (BP). However, the system can be expanded by adding more processors to increase pipeline stages and performance, and more backup processors to increase system reliability. The system can automatically reorganize itself in the event of a failure of one or two processors and execution continues without interruption. Each processor communicates with its neighboring processors through input/output (I/O) ports which are used as bypass links between the processors. In the event of a processor failure, the function of the faulty processor is assigned to the next processor that is free from faults. The fast Fourier transform (FFT) algorithm is implemented on the simulated circuit to evaluate the performance of the proposed system. The results showed that the system can continue to execute even if one or two processors fail without a noticeable decrease in performance.
This paper advances the Domain Segmentation based on Uncertainty in the Surrogate (DSUS) framework which is a novel approach to characterize the uncertainty in surrogates. The leave-one-out cross-validation technique is adopted in the DSUS framework to measure local errors of a surrogate. A method is proposed in this paper to evaluate the performance of the leave-out-out cross-validation errors as local error measures. This method evaluates local errors by comparing: (i) the leave-one-out cross-validation error with (ii) the actual local error estimated within a local hypercube for each training point. The comparison results show that the leave-one-out cross-validation strategy can capture the local errors of a surrogate. The DSUS framework is then applied to key aspects of wind resource as- sessment and wind farm cost modeling. The uncertainties in the wind farm cost and the wind power potential are successfully characterized, which provides designers/users more confidence when using these models
An Efficient Approach Towards Mitigating Soft Errors Riskssipij
ย
Smaller feature size, higher clock frequency and lower power consumption are of core concerns of todayโs nano-technology, which has been resulted by continuous downscaling of CMOS technologies. The resultantโdevice shrinkingโ reduces the soft error tolerance of the VLSI circuits, as very little energy is needed to change their states. Safety critical systems are very sensitive to soft errors. A bit flip due to soft error can change the value of critical variable and consequently the system control flow can completely be changed which leads to system failure. To minimize soft error risks, a novel methodology is proposed to detect and recover from soft errors considering only โcritical code blocksโ and โcritical variablesโ rather than considering all variables and/or blocks in the whole program. The proposed method shortens space and time overhead in comparison to existing dominant approaches.
Fault tolerance is an important issue in the field of cloud computing which is concerned with the techniques or mechanism needed to enable a system to tolerate the faults that may encounter during its functioning. Fault tolerance policy can be categorized into three categories viz. proactive, reactive and adaptive. Providing a systematic solution the loss can be minimized and guarantee the availability and reliability of the critical services. The purpose and scope of this study is to recommend Support Vector Machine, a supervised machine learning algorithm to proactively monitor the fault so as to increase the availability and reliability by combining the strength of machine learning algorithm with cloud computing.
Efficient failure detection and consensus at extreme-scale systemsIJECEIAES
ย
Distributed systems and extreme-scale systems are ubiquitous in recent years and have seen throughout academia organizations, business, home, and government sectors. Peer-to-peer (P2P) technology is a typical distributed system model that is gaining popularity for delivering computing resources and services. Distributed systems try to increase its availability in the event of frequent component failures and functioning the system in such scenario is notoriously difficult. In order to identify component failures in the system and achieve global agreement (consensus) among failed components, this paper implemented an efficient failure detection and consensus algorithm based on fail-stop type process failures. The proposed algorithm is fault-tolerant to process failures occurring before and during the execution of the algorithm. The proposed algorithm works with the epidemic gossip protocol, which is a randomly generated paradigm of computation and communication that is both fault-tolerant and scalable. A simulation of an extreme-scale information dissemination process shows that global agreement can be achieved. A P2P simulator, PeerSim, is used in the paper to implement and test the proposed algorithm. The proposed algorithm results exhibited high scalability and at the same time detected all the process failures. The status of all the processes is maintained in a Boolean matrix.
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault
tolerance. A distributed system may require taking checkpoints from time to time to keep it free of arbitrary
failures. In case of failure, the system will rollback to checkpoints where global consistency is preserved.
Checkpointing is one of the fault-tolerant techniques to restore faults and to restart job fast. The algorithms
for checkpointing on distributed systems have been under study for years.
It is known that checkpointing and rollback recovery are widely used techniques that allow a distributed
computing to progress inspite of a failure.There are two fundamental approaches for checkpointing and
recovery.One is asynchronus approach, process take their checkpoints independenty.So,taking checkpoints
is very simple but due to absence of a recent consistent global checkpoint which may cause a rollback of
computation.Synchronus checkpointing approach assumes that a single process other than the application
process invokes the checkpointing algorithm periodically to determine a consistent global checkpoint.
Truly dependable software systems should be built with structuring techniques able to decompose the software complexity without
hiding important hypotheses and assumptions such as those regarding
their target execution environment and the expected fault- and system
models. A judicious assessment of what can be made transparent and
what should be translucent is necessary. This paper discusses a practical
example of a structuring technique built with these principles in mind:
Reflective and refractive variables. We show that our technique offers
an acceptable degree of separation of the design concerns, with limited
code intrusion; at the same time, by construction, it separates but does
not hide the complexity required for managing fault-tolerance. In particular, our technique offers access to collected system-wide information
and the knowledge extracted from that information. This can be used
to devise architectures that minimize the hazard of a mismatch between
dependable software and the target execution environments.
High Availability of Services in Wide-Area Shared Computing NetworksMรกrio Almeida
ย
(Check my blog @ http://www.marioalmeida.eu/ )
Highly available distributed systems have been widely used and have proven to be resistant to a wide range of faults. Although these kind of services are easy to access, they require an investment that developers might not always be willing to make. We present an overview of Wide-Area shared computing networks as well as methods to provide high availability of services in such networks. We make some references to highly available systems that are being used and studied at the moment this paper was written (2012).
Fault localization is time-consuming and difficult, which makes it the bottleneck of the
debugging progress. To help facilitate this task, there exist many fault localization techniques
that help narrow down the region of the suspicious code in a program. Better accuracy in fault
localization is achieved from heavy computation cost. Fault localization techniques that can
effectively locate faults also manifest slow response rate. In this paper, we promote the use of
pre-computing to distribute the time-intensive computations to the idle period of coding phase,
in order to speed up such techniques and achieve both low-cost and high accuracy. We raise the
research problems of finding suitable techniques that can be pre-computed and adapt it to the
pre-computing paradigm in a continuous integration environment. Further, we use an existing
fault localization technique to demonstrate our research exploration, and shows visions and
challenges of the related methodologies.
Fault localization is time-consuming and difficult,
which makes it the bottleneck of the
debugging progress. To help facilitate this task, t
here exist many fault localization techniques
that help narrow down the region of the suspicious
code in a program. Better accuracy in fault
localization is achieved from heavy computation cos
t. Fault localization techniques that can
effectively locate faults also manifest slow respon
se rate. In this paper, we promote the use of
pre-computing to distribute the time-intensive comp
utations to the idle period of coding phase,
in order to speed up such techniques and achieve bo
th low-cost and high accuracy. We raise the
research problems of finding suitable techniques th
at can be pre-computed and adapt it to the
pre-computing paradigm in a continuous integration
environment. Further, we use an existing
fault localization technique to demonstrate our res
earch exploration, and shows visions and
challenges of the related methodologies.
Configuration Navigation Analysis Model for Regression Test Case Prioritizationijsrd.com
ย
Regression testing has been receiving increasing attention nowadays. Numerous regression testing strategies have been proposed. Most of them take into account various metrics like cost as well as the ability to find faults quickly thereby saving overall testing time. In this paper, a new model called the Configuration Navigation Analysis Model is proposed which tries to consider all stakeholders and various testing aspects while prioritizing regression test cases.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
ย
Call for paper 2012, hard copy of Certificate, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJCER, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, research and review articles, IJCER Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathematics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer review journal, indexed journal, research and review articles, engineering journal, www.ijceronline.com, research journals,
yahoo journals, bing journals, International Journal of Computational Engineering Research, Google journals, hard copy of Certificate,
journal of engineering, online Submission
Similar to TOWARDS PARSIMONIOUS RESOURCE ALLOCATION IN CONTEXT-AWARE N-VERSION PROGRAMMING (20)
In the Fifties, Arnold Schรถnberg introduced a model for music composition that he called "Grundgestalt", the basic shape. In this seminar I show how I interpreted this concept as a generative music model that translates the orbits of dynamic systems into musical components. I also describe a family of experiments that led me to the creation of simple and not-so-simple musical compositions, which I call "my little Grundgestaltenโ. Excerpts from a selection of those compositions will be presented.
Models and Concepts for Socio-technical Complex Systems: Towards Fractal Soci...Vincenzo De Florio
ย
We introduce fractal social organizationsโa novel class of socio-technical complex systems characterized
by a distributed, bio-inspired, hierarchical architecture. Based on a same building block that is recursively
applied at different layers, said systems provide a homogeneous way to model collective behaviors of
different complexity and scale. Key concepts and principles are enunciated by means of a case study and a
simple formalism. As preliminary evidence of the adequacy of the assumptions underlying our systems here
we define and study an algebraic model for a simple class of social organizations. We show how despite its
generic formulation, geometric representations of said model exhibit the spontaneous emergence of complex
hierarchical and modular patterns characterized by structured addition of complexity and fractal natureโ
which closely correspond to the distinctive architectural traits of our fractal social organizations. Some
reflections on the significance of these results and a view to the next steps of our research conclude this
contribution.
On the Role of Perception and Apperception in Ubiquitous and Pervasive Enviro...Vincenzo De Florio
ย
Building on top of classic work on the perception of natural systems this paper addresses the role played by such quality in
environments where change is the rule rather than the exception. As in natural systems, perception in software systems takes two major forms: sensory perception and awareness (also known as apperception). For each of these forms we introduce semi-formal models that allow us to discuss and characterize perception and apperception failures in software systems evolving in environments subjected to rapid and sudden changesโsuch as those typical of ubiquitous and pervasive computing. Our models also provide us with two partial orders to compare such software systems with one another as well as with reference environments. When those
environments evolve or change, or when the software themselves evolve after their environments, the above partial orders may be used to compute new environmental fits and different strategic fits and gain insight on the degree of resilience achieved through the current adaptation steps.
Service-oriented Communities: A Novel Organizational Architecture for Smarter...Vincenzo De Florio
ย
The seminar I shall present at Masaryk University in Brno on May 19, 2016. A video of this presentation is available at https://www.youtube.com/edit?video_id=Fu5kv0sFWG4
On codes, machines, and environments: reflections and experiencesVincenzo De Florio
ย
Code explicitly refers to a reference machine and, implicitly, to a set of conditions often called the system model and the fault model.
If one wants to guarantee an agreed-upon quality of service, one needs to either make assumptions about those conditions or adapt to them.
In this lecture I present this problem and a number of solutions, both practical and theoretical, that I have devised in the course of my career.
Although the main accent is on programming languages, here I provide links and references to other approaches that operate at algorithmic- and system-level.
Tapping Into the Wells of Social Energy: A Case Study Based on Falls Identifi...Vincenzo De Florio
ย
Are purely technological solutions the best answer we can get to the shortcomings our organizations are often experiencing today? The results we gathered in this work lead us to giving a negative answer to such question. Science and technology are powerful boosters, though when they are applied to the โlocal, static organization of an obsolete yesterdayโ they fail to translate in the solutions we need to our problems. Our stance here is that those boosters should be applied to novel, distributed, and dynamic models able to allow us to escape from the local minima our societies are currently locked in. One such model is simulated in this paper to demonstrate how it may be possible to tap into the vast basins of social energy of our human societies to realize ubiquitous computing sociotechnical services for the identification and timely response to falls.
Accompanying paper available at https://arxiv.org/abs/1508.06655
How Resilient Are Our Societies?Analyses, Models, Preliminary ResultsVincenzo De Florio
ย
Traditional social organizations such as those for the management of healthcare and civil defense are the result of designs and realizations that matched well with an operational
context considerably different from the one we are experiencing today: A simpler world, characterized by a greater amount of resources to match less users producing lower peaks of requests.
The new context reveals all the fragility of our societies: unmanageability is just around the corner unless we do not complement the โold recipesโ with smarter forms of social organization.
Here we analyze this problem and propose a refinement to our fractal social organizations as a model for resilient cyber-physical societies. Evidence to our claims is provided by simulating our model in terms of multi-agent systems.
This course teaches engineering students how to program in C. I gave this course for several years in the framework of the "Advanced Technology Higher Education Network" / SOCRATES program.
A framework for trustworthiness assessment based on fidelity in cyber and phy...Vincenzo De Florio
ย
We introduce a method for the assessment of trust for n-open systems based on a measurement of fidelity and present a prototypic implementation of a complaint architecture. We construct a MAPE loop which monitors the compliance between corresponding figures of interest in cyber- and physical domains; derive measures of the systemโs trustworthiness; and use them to plan and execute actions aiming at guaranteeing system safety and resilience. We conclude with a view on our future work.
Presented at ANTIFRAGILE'15
Companion paper available at http://www.sciencedirect.com/science/article/pii/S1877050915008923
A behavioural model for the discussion of resilience, elasticity, and antifra...Vincenzo De Florio
ย
Resilience is one of those "general systems attributes" that appear to play a central role in several disciplines - including ecology, business, psychology, industrial safety, microeconomics, computer networks, security, management science, cybernetics, control theory, crisis and disaster management. Resilience thus seems to be "needed" everywhere; and yet, even in the framework of a same discipline, it is not easy to define it precisely and consensually. To add to the confusion, other terms such as elasticity, change tolerance, and antifragility, although clearly related to resilience, cannot be easily differentiated.
In this talk I tackle this problem by introducing a behavioural model of resilience. I interpret resilience as the property emerging from the interaction of the behaviours produced by two "players": a system and a hosting environment. The outcome of said interaction depends on both intrinsic and extrinsic factors, including the systemic "traits" of the system but also how the system's endowment matches the requirements expressed by the behaviours of the environment. I show how the behavioural approach provides a unifying framework within which it is possible to express coherent definitions for elasticity, change tolerance, and antifragility.
A Behavioral Interpretation of Resilience and AntifragilityVincenzo De Florio
ย
In this presentation I discuss resilience and antifragility as behaviors resulting from the coupling of a system and its environment(s). Depending on the interactions between these two "ends" and on the quality of the individual behaviors that they may exercise, different strategies may be chosen: elasticity (change masking); entelechism (change tolerance); and antifragility (adapting to & learning from change). When the environment is very simple and only capable of so-called "random behavior", often the only effective strategy towards resilience is off-line dimensioning of redundancy as a result of a worst-case assessment of disturbances and/or threats. Much more complex and variegated is the case when both systems and environments are "intelligent" -- or at least able to exercise complex teleological and extrapolatory behaviors. In this case both system and ambient may choose among a variety of strategies in what could be regarded as a complex evolutionary game theory setting.
Community Resilience: Challenges, Requirements, and Organizational ModelsVincenzo De Florio
ย
An important challenge for human societies is that of mastering the complexity of Community Resilience, namely โthe sustained ability of a community to utilize available resources to respond to, withstand, and recover from adverse situationsโ. The above concise definition puts the accent on an important requirement: a communityโs ability to
make use in an intelligent way of the available resources, both institutional and spontaneous, in order to match the complex evolution of the โsignificant multi-hazard threats characterizing a crisisโ. Failing to address such requirement exposes a community to extensive failures that are known to exacerbate the consequences of natural and human-induced crises. As a consequence, we experience today an urgent need to respond to the challenges of community resilience engineering. This problem, some reflections, and preliminary prototypical contributions constitute
the topics of this presentation.
A companion article is available at https://dl.dropboxusercontent.com/u/67040428/Articles/serene14.pdf
On the Behavioral Interpretation of System-Environment Fit and Auto-ResilienceVincenzo De Florio
ย
Already 71 years ago Rosenblueth, Wiener, and Bigelow introduced the concept of the โbehavioristic study of natural eventsโ and proposed a classification of systems according to the quality of the behaviors they are able to exercise. In this presentation we consider the problem of the resilience of a system when deployed in a changing environment, which we tackle by considering the behaviors both the system organs and the environment mutually exercise. We then introduce a partial order and a metric space for those behaviors, and we use them to define a behavioral interpretation of the concept of system-environment fit. Moreover we suggest that behaviors based on the extrapolation of future environmental requirements would allow systems to proactively improve their own system-environment fit and optimally evolve their resilience. Finally we describe how we plan to express a complex optimization strategy in terms of the concepts introduced in this presentation.
The paper accompanying this presentation is available at https://dl.dropboxusercontent.com/u/67040428/Articles/DF14b_Wiener21stA.pdf
Antifragility = Elasticity + Resilience + Machine Learning. Models and Algori...Vincenzo De Florio
ย
Presentation for the ANTIFRAGILE 2014 workshop, https://sites.google.com/site/resilience2antifragile/
Abstract: We introduce a model of the fidelity of open systemsโfidelity being interpreted here as the compliance between corresponding
figures of interest in two separate but communicating domains. A special case of fidelity is given by real-timeliness and synchrony,
in which the figure of interest is the physical and the systemโs notion of time. Our model covers two orthogonal aspects of fidelity,
the first one focusing on a systemโs steady state and the second one capturing that systemโs dynamic and behavioral characteristics.
We discuss how the two aspects correspond respectively to elasticity and resilience and we highlight each aspectโs qualities and
limitations. Finally we sketch the elements of a new model coupling both of the first modelโs aspects and complementing them
with machine learning. Finally, a conjecture is put forward that the new model may represent a first step towards compositional
criteria for antifragile systems.
Service-oriented Communities and Fractal Social Organizations - Models and co...Vincenzo De Florio
ย
Presentation given by Vincenzo De Florio at the Ceremony for the handing of the 2013 Faculty Awards.
Keywords: Fractal social organizations; service oriented communities; mutual assistance communities
Seminarie Computernetwerken 2012-2013: Lecture I, 26-02-2013Vincenzo De Florio
ย
Seminarie Computernetwerken is a course given at Universiteit Antwerpen, Belgium
A series of seminars focusing on various themes changing from year to year.
This year's themes are: resilience, behaviour, evolvability; in systems, networks, and organizations
In what follows we describe:
themes of the course
view to the seminars
rules of the game
A Formal Model and an Algorithm for Generating the Permutations of a MultisetVincenzo De Florio
ย
This paper may be considered as a mathematical divertissement as well as a didactical tool for
undergraduate students in a universitary course on algorithms and computation. The well-known problem of
generating the permutations of a multiset of marks is considered. We define a formal model and an abstract
machine (an extended Turing machine). Then we write an algorithm to compute on that machine the successor
of a given permutation in the lexicographically ordered set of permutations of a multiset. Within the model we
analyze the algorithm, prove its correctness, and show that the algorithm solves the above problem. Then we
describe a slight modification of the algorithm and we analyze in which cases it may result in an improvement of
execution times.
This paper, the ideas in it, and its realization are the work of the first author only.
A FAULT-TOLERANCE LINGUISTIC STRUCTURE FOR DISTRIBUTED APPLICATIONSVincenzo De Florio
ย
The structures for the expression of fault-tolerance provisions into the application
software are the central topic of this dissertation.
Structuring techniques provide means to control complexity, the latter being a relevant factor
for the introduction of design faults. This fact and the ever increasing complexity of todayโs dis-
tributed software justify the need for simple, coherent, and effective structures for the expression
of fault-tolerance in the application software. A first contribution of this dissertation is the defi-
nition of a base of structural attributes with which application-level fault-tolerance structures can
be qualitatively assessed and compared with each other and with respect to the above mentioned
need. This result is then used to provide an elaborated survey of the state-of-the-art of software
fault-tolerance structures.
The key contribution of this work is a novel structuring technique for the expression of the
fault-tolerance design concerns in the application layer of those distributed software systems
that are characterised by soft real-time requirements and with a number of processing nodes
known at compile-time. The main thesis of this dissertation is that this new structuring tech-
nique is capable of exhibiting satisfactory values of the structural attributes in the domain of soft
real-time, distributed and parallel applications. Following this novel approach, beside the con-
ventional programming language addressing the functional design concerns, a special-purpose
linguistic structure (the so-called โrecovery languageโ) is available to address error recovery and
reconfiguration. This recovery language comes into play as soon as an error is detected by an
underlying error detection layer, or when some erroneous condition is signalled by the applica-
tion processes. Error recovery and reconfiguration are specified as a set of guarded actions, i.e.,
actions that require a pre-condition to be fulfilled in order to be executed. Recovery actions deal
with coarse-grained entities of the application and pre-conditions query the current state of those
entities.
An important added value of this so-called โrecovery language approachโ is that the exe-
cutable code is structured so that the portion addressing fault-tolerance is distinct and separated
from the rest of the code. This allows for division of complexity into distinct blocks that can be
tackled independently of each other.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
ย
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
ย
Are you looking to streamline your workflows and boost your projectsโ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, youโre in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part โEssentials of Automationโ series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Hereโs what youโll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
Weโll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Donโt miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
ย
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
ย
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
ย
As AI technology is pushing into IT I was wondering myself, as an โinfrastructure container kubernetes guyโ, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefitโs both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
ย
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
ย
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dev Dives: Train smarter, not harder โ active learning and UiPath LLMs for do...UiPathCommunity
ย
๐ฅ Speed, accuracy, and scaling โ discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Miningโข:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing โ with little to no training required
Get an exclusive demo of the new family of UiPath LLMs โ GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
๐จโ๐ซ Andras Palfi, Senior Product Manager, UiPath
๐ฉโ๐ซ Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
ย
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. Whatโs changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
ย
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
ย
TOWARDS PARSIMONIOUS RESOURCE ALLOCATION IN CONTEXT-AWARE N-VERSION PROGRAMMING
1. TOWARDS PARSIMONIOUS RESOURCE ALLOCATION
IN CONTEXT-AWARE N-VERSION PROGRAMMING
Jonas Buys, Vincenzo De Florio and Chris Blondia
University of Antwerp, Dept. of Mathematics and Computer Science, Antwerp, Belgium
Interdisciplinary Institute for Broadband Technology, Ghent-Ledeberg, Belgium
{jonas.buys, vincenzo.deflorio, chris.blondia}@ua.ac.be
Keywords: fault-tolerance, n-version programming, contextawareness, measurement techniques, dependability.
integrated within the dependability scheme. Indeed, the use
of replicas of poor reliability can result in a system tolerant of
faults but with poor reliability [5]. It is therefore crucial for
the system to continuously monitor the operational status of
the available resources and avoid the use of resources that do
not significantly contribute to an increase of dependability, or
that may even jeopardise the schemes' overall effectiveness.
Abstract
Adopting classic redundancy-based fault-tolerant schemes in
highly dynamic distributed computing systems does not
necessarily result in the anticipated improvement in
dependability. This primarily stems from statically predefined
redundancy configurations employed within many classic
dependability strategies, which as well known may negatively
impact the schemes' overall effectiveness. In this paper, a
novel dependability strategy is introduced encompassing
advanced redundancy management, aiming to autonomously
tune its internal configuration in function of disturbances
observed. Policies for parsimonious resource allocation are
presented thereafter, intent upon increasing the scheme's cost
effectiveness without breaching its availability objective. Our
experimentation suggests that the suggested solution can
achieve a substantial improvement in availability, compared
to traditional, static redundancy strategies, and that tuning the
adopted degree of redundancy to the actual observed
disturbances allows unnecessary resource expenditure to be
reduced, therefore enhancing cost-effectiveness.
Secondly, redundant implementations imply significantly
higher development costs and increased infrastructural
requirements. A predetermined degree of redundancy is
therefore cost ineffective in that it inhibits to economise on
resource consumption in case the actual number of
disturbances could be successfully overcome by a lesser
amount of redundancy. Reversely, when the foreseen amount
of redundancy is not enough to compensate for the currently
experienced disturbances, the inclusion of additional
resources may prevent further service disruption.
In this paper, a novel, context-aware dependability strategy is
introduced encompassing advanced redundancy management.
Designed to sustain high availability and reliability, this
adaptive fault-tolerant strategy dynamically alters the degree
of redundancy and the employed selection of resources.
The remainder of this paper is structured as follows: We first
present the concept of n-version programming (NVP)
schemata in Sect. 2 and show how software design failures in
the content failure domain may challenge their effectiveness.
A set of ancillary metrics is first set forth in Sect. 3, allowing
to deduce knowledge of the context in which the scheme is
operating and to detect the proximity of hazardous situations
that may require the adjustment of the redundancy
configuration. We then move on to elaborate on the internals
of the proposed adaptive fault-tolerant strategy in Sect. 4.
Policies for parsimonious resource allocation are presented
thereafter, intent upon increasing the scheme's cost
effectiveness without breaching its availability objective.
Sect. 6 reports on the strategy's effectiveness, after which this
paper is concluded.
1 Introduction
Business- and mission-critical distributed applications are
increasingly expected to exhibit highly dependable characteristics, particularly in the areas of reliability and timeliness.
Redundancy-based fault-tolerant strategies have long been
used as a means to avoid a disruption in the service provided
by the system in spite of failures in the underlying software
components. Adopting fault-tolerance strategies in dynamic
distributed computing systems, in which components often
suffer from long response times or temporary unavailability,
does not necessarily result in the anticipated improvement in
dependability. This primarily stems from statically predefined
redundancy configurations employed within many classic
dependability strategies, i.e. a fixed degree of redundancy
and, accordingly, an immutable selection of functionallyequivalent software components, which may negatively
impact the schemes' effectiveness from the following angles:
2 n-Version Programming with Majority Voting
NVP was first introduced in 1985 as a tool to provide
protection against software design faults [5]. The rationale is
that deploying multiple functionally-equivalent independently
implemented software components will hopefully reduce the
probability of a software fault affecting multiple implementations simultaneously, thereby keeping the system operational.
Firstly, a static, context-agnostic configuration may in time
lead to a more rapid exhaustion of the available redundancy
and therefore fail to properly counterbalance any disturbances
affecting the operational status (context) of the components
1
2. timed asynchronous system model [2, 6]. For such environments, several types of potential disturbances were defined in
the failure manifestation model provided in [2]. We will now
elaborate on a subset of these disturbances that may ensue
from the activation of a software design fault and manifest
within the content failure domain. Such types of disturbances
exhibit a transient behaviour and emerge exclusively from the
activation of a latent software design fault along the execution
path followed whilst version vi is processing some request
โนc, โ, iโบ, i.e. when the request is in the request processing state
(see Fig. 1). One may distinguish two types of failures, each
with different repercussions on the generated partition โ( c ,โ ) .
Figure 1: State transition diagram of voting round (c,โ) and
the underlying invocations โนc, โ, iโบ of versions vi.
An n-version composite constitutes a client-transparent
replication layer in which all n programs, called versions,
receive a copy of the user input and independently perform
their computations in parallel. Let {โx}c be a sequence of
monotonically increasing, strictly positive integer indices
โx = x in L = โ +, such that each voting round, i.e. a single
invocation of an NVP composite c, is uniquely identified. As
shown in Fig. 1, the arrival of a request message at the
composite interface will trigger the initialisation of a new
voting round (c,โ) with โ the next element in {โx}c.
Immediately after, the system is to retrieve the redundancy
configuration to be used throughout the newly initialised
voting round (c,โ), i.e. the amount of redundancy used and,
accordingly, a selection of functionally-equivalent versions
(FeV) (transition from state a to b). We define V as the set of
all FeV in the system. For a given round (c,โ), the amount of
redundancy used within the NVP scheme is denoted as
n(c,โ) โฅ 1, such that the versions employed for round (c,โ) are
contained within V(c,โ) โ V and n(c,โ) = |V(c,โ)|. Having acquired
the redundancy configuration, the request message payload is
then replicated and forwarded to each of the selected versions
vi โ V(c,โ), for i โ {1, ..., n(c,โ)}. Each such invocation of a
version vi can be uniquely identified by the tuple โนc, โ, iโบ. As
soon as a result is available for each of the versions involved,
the transition from state b to c fires and a majority voting
(MV) procedure is called so as to adjudicate the result of the
scheme, which is then returned to the client.
Firstly, despite sharing a common specification, functionallyequivalent versions may exhibit discrepancies resulting in
response value failures (RVF) [4]. Such failures will usually
not make the system appear to fail, as the content of the
response returned via the service interface is syntactically
correct, though not compliant to the functional specifications.
ษบ
During voting, a partition โช j โ{1,โฆ,k ( c ,โ ) } Pj is constructed for all
versions vi โ V(c,โ) that returned a syntactically valid response,
allotting versions that reported equivalent results to the same
equivalence classes Pj.
Secondly, the activation of a design fault may cause an abrupt
interruption of the flow of execution due to an exception โ a
case of so-called erroneous value failures (EVF). Falling
within the content failure domain, a syntactically invalid
response message will be returned for the affected invocation,
containing the serialised exception thrown. Versions affected
(c,
by an EVF are classified in PF โ ). EVF failures may overrule
previously activated RVF failures affecting an invocation [2].
Adhering to the discrete-event model developed in [2] for the
injection of transient disturbances in NVP schemata, these
two disturbance types will be used to assess the performance
of the dependability strategy introduced in Sect. 4.
3. Redundancy configuration effectiveness
Let P(c,โ) be the set of largest cardinality in the generated
(c,
partition โ(c,โ ) PF โ.) Then c(c,โ ) = | P (c,โ ) | represents the largest
max
consent found amongst the versions in V(c,โ) within the scope
of (c,โ). In order for the scheme to adjudicate a result o,
there should be a consensus amongst an absolute majority of
the n(c,โ) versions, i.e. c(c,โ ) โฅ m(c,โ ) , with
max
The effectiveness of a scheme such as NVP/MV is largely
determined by its redundancy configuration and its ability to
counterbalance the disturbances ensuing from the
environment in which it operates and to which it is subject.
For instance, an NVP/MV scheme can mask failures affecting
the availability of up to a minority of its n(c,โ) versions.
(c, โ )
๏ฃน.
m ( c, โ ) = ๏ฃฎ n
๏ฃฏ
2๏ฃบ
The essential part of any voting procedure is the construction
(c, โ )
(c,
ษบ
= {PF โ ) , โช j โ{1,โฆ, k ( c , โ ) } Pj } of the set of
of a partition โ
(c,โ)
versions V . The notation of a disjoint union to represent
the consensus blocks as part of this partition has been taken
from [1]. This partitioning procedure is influenced by the
disturbances that affected any of the requests โนc, โ, iโบ involved
during the voting round (c,โ). Throughout this paper, the
notion of disturbance is used to denote the event of a single
request โนc, โ, iโบ struck by the some type of failure, resulting in
the perturbation and, consequently, the (temporary) unavailability of the service that version vi is expected to provide.
(1)
Put differently, m(c,โ) is indicative of the smallest degree of
consent needed for a consensus block P(c,โ) to qualify for the
equivalence class [o]. Consequently, a scheme is resilient to
withstand disturbances affecting at most a minority
n(c,โ) โ m(c,โ) of the n(c,โ) versions used throughout round (c,โ).
We now introduce a new metric defined as c(c,โ ) - m (c,โ )
max
that was designed to provide a quantitative estimation of how
closely the current amount and selection of resources matched
the observed disturbances โ by shortcoming or excess. More
specifically, it can be used to deduce a measure of the
proximity of hazardous situations that may necessitate the
The application of NVP schemata in contemporary distributed
computing systems is assumed to exhibit the properties of a
2
3. 4.2. Redundancy dimensioning model
adjustment of the currently employed redundancy configu(c, โ )
(c,โ )
ration. One can easily see that Ran(c max โ m ) lies in
(c, โ )
(c,โ)
(c,โ)
(c,โ)
[-m , n โ m ], for c max was defined in [0, n(c,โ)]. It
provides an indirect estimation of the shortage or abundance
of redundancy with respect to the disturbances affecting
round (c,โ): a positive value essentially quantifies how many
versions exist in excess of the mandatory m(c,โ) versions that
collectively constitute a majority for round (c,โ). For negative values, the absolute value represents the lack of consent
relative to P(c,โ) that would be required so as to constitute a
majority. Such a value is interpreted as a symptom that the
currently experienced disturbances cannot be successfully
counterbalanced by the redundancy configuration used, and
the scheme would fail to guarantee the availability of the
service it seeks to provide despite its fault-tolerant nature. A
critically low value 0 represents a situation for which the
majority was attained by only m(c,โ) versions: the available
redundancy was completely exhausted to counterbalance the
maximal number of disturbances the scheme could tolerate
and any additional disturbance would have led to failure.
Given the FeV in V, this model is responsible for
autonomously adjusting the degree of redundancy employed
such that it closely follows the evolution of the observed
disturbances. In the absence of disturbances, the scheme
should scale down its use of replicas and avoid unnecessary
resource expenditure. Contrarily, when the foreseen amount
of redundancy is not enough to compensate for the currently
experienced disturbances, it will dynamically revise that
amount and enrol additional resources if available. The
model determines n(c,โ) upon initialisation of round (c,โ), such
that nmin โค n(c,โ) โค min(nmax, |V|). Ideally, the redundancy
configuration would involve a number n(c,โ) of versions
consistently greater than or equal but close to cr(e(c,โ)), i.e. the
minimal amount of redundancy required during the
operational time interval for a voting round (c,โ) in order to
stay dependable and have the decision algorithm return the
correct result in spite of being challenged by a number of
disturbances e(c,โ) , which can be expressed as the function
cr : โ 0 โ โ + such that cr(e(c,โ)) = 2 e(c,โ) + 1 [2].
4. A Novel Adaptive Fault-Tolerant Strategy
We introduce our adaptive NVP-based fault-tolerant strategy
(A-NVP) and elaborate on the advanced redundancy
management it supports. Our context-aware reformulation of
the classical NVP approach encompasses two complementary
parameterised models that jointly determine the optimal
redundancy configuration to be used throughout a newly
initialised voting round (c,โ) in view of disturbances and how
these were perceived to have affected the system context.
The redundancy configuration is retrieved whilst preparing to
fire the transition from state a to b, as shown in Fig. 1.
Figure 2: View on window-based data structure before n(c,โ)
is set for use within a newly initialised round (c,โ).
4.2.1. Window of Context Information
Let {yz}c be a monotonically increasing sequence of strictly
positive integer indices yz = z in Y = โ + such that each
consecutive completion of some voting round (c,โ)
originating from an invocation of c is uniquely identified by
the next element y in {yz}c. Observe how z denotes the
number of completed voting rounds appertaining to c. The
bijection bc : Y โ L defines the correlation between the terms
in either of the sequences {yz}c and {โx}c. The order in which
rounds complete, i.e. the sequence {yz}c, is not necessarily the
order in which these rounds have been initialised, represented
by the sequence {โx}c. Indeed, in large-scale distributed
computing environments, one may expect a significant
amount of variability in the response time of an invocation on
the NVP composite, due e.g. to its dynamically changing
redundancy configuration used for different voting rounds.
More information on timeliness issues can be found in [2].
The first model is responsible for determining the appropriate
degree of redundancy n(c,โ) to be used and will allow economising on resource expenditures whenever it can argued safe to
do. Intent upon increasing the schemeโs cost effectiveness
without breaching its dependability objective, policies for
parsimonious resource allocation presented in Sect. 5. Next,
the replica selection model will establish an appropriate set of
resources V(c,โ) maximising the objectives expressed by means
of a set of application-agnostic properties [3].
4.1. Application-specific requirements
The optimal redundancy configuration is not only determined
by the context property introduced in Sect. 3, but also by the
characteristics of the application itself, or the environment in
which it operates. Some applications may operate in a
resource-constrained environment. The A-NVP algorithm was
conceived to take these application-specific intricacies into
account, in that both models can be configured by means of a
set of user-defined parameters. Firstly, a parameter nmax โฅ 1
can be used to set an upper bound on the number of replicas
to be used. Contrariwise, parameter 2e + 1 = nmin โฅ 1 can be
used to set a lower bound on the degree of redundancy to be
used, such that the scheme is guaranteed to tolerate at least e
disturbances. Lastly, parameter ninit will set the default degree
of redundancy to be initially used, i.e. n(c,โ) = ninit for โ = 1.
In order to make an informed decision on the amount of
redundancy n(c,โ) to be used, the model will consider the
course of the amount of redundancy used for previously
completed voting rounds and whether or not the selected
redundancy proved to be sufficient to guarantee the scheme's
availability. As such, the context manager within the NVP
composite will maintain a window-based data structure to
store this contextual information for each y in the
subsequence {y min(1,z โ r d +1) , โฆ , y z } of {yz}c. More specifically,
for each completed voting round y, the context properties of
3
4. interest are: the corresponding identifier โx = bc(y), the
amount of redundancy n (c,โ x ) that was employed throughout its
execution, and the extent to which this redundancy was found
to be capable of masking disturbances โ cf. Sect. 3.
notion of window semantics Sc to epitomise the specific
conditionalities and correlational techniques that enable the
redundancy dimensioning model to deduce the optimum
degree of redundancy matching the scheme's operational
context from the stored information. In this capacity, Sc
defines two ancillary functions describing a relation L โ โ +.
More specifically, the upscaling function fu(c,โ) is responsible
for determining if and to what extent the current level of
redundancy n(c,โ-1) should increase, whereas the downscaling
function fd(c,โ) quantifies the extent by which n(c,โ-1) should
be lowered. The final degree of redundancy n(c,โ) to be used
for the continuation of (c,โ) is then resolved as follows:
Let rd, ru and rf be numbers in โ + such that 1 โค rf โค ru โค rd.
Number rd represents the minimum number of consecutive
successful voting round completions before contemplating
scaling down the current level of redundancy. In line with
Sect. 2, a given voting round (c,โ) is observed to have
completed successfully if a sufficiently large degree of
consent could be found between the responses acquired from
the subordinate invocations โนc, โ, iโบ of the involved versions
vi โ V(c,โ) such that a majority could be found,
i.e. c(c,โ ) โ m (c,โ ) โฅ csm, with a discretionary safety margin csm
max
defined as an integer in [0, n(c,โ) โ m(c,โ)] (SC). As shown in
Fig. 2, rd imposes an upper bound on the maximum window
capacity maintained by the data structure. It is assumed that
shorter window lengths may result in incautious downscaling
of the redundancy, which in itself might lead to failure of the
voting scheme in subsequent voting rounds. Contrariwise,
one may reasonably expect that the redundancy scheme is less
likely to fail due to the downscaling of the employed degree
of redundancy for larger values of rd, at the expense of
postponing the relinquishment of excess redundancy [3].
n
(c,โ)
min(min(nmax, |V|), n(c,โ-1) + fu(c,โ)) | fu(c,โ) > 0 (2a)
max(nmin, n(c,โ-1) โ fd(c,โ))
| fd(c,โ) > 0 (2b)
n(c,โ-1)
| else
(2c)
The above Eq. (2a) and (2b) formalise the upscaling, resp.
downscaling procedure for โ > 1. Observe how the adjustment of the redundancy is constrained by the applicationspecific parameters nmin and nmax, as well as by the amount |V|
of resources available. Any specific Sc should implement
these two functions such that a value is returned only if the
window contains information abiding the success criterion
(SC) and the definitions given for rd, ru and rf hereabove.
The optional safety margin csm expresses the amount of
consent supplementary to the mandatory m(c,โ) required for the
successful adjudication of a result. It serves as a parameter to
the redundancy dimensioning model, primarily aiming to
reduce the likelihood that the downscaling procedure itself
would result in failure of the scheme in the first few
subsequent voting rounds. Moreover, such safety margin
could anticipate a shortfall in redundancy when the
effectiveness of the employed redundancy is observed to
exhibit a decreasing trend and proactively trigger the
upscaling procedure. Either way, the underlying rationale for
maintaining a slightly higher degree of redundancy stems
from the assumption that the environment behaves unpredictably and the number of disturbances e(c,โ) it brings about
affecting ongoing voting rounds โ, therefore, may vary
considerably. In contrast, the dimensioning model was
designed to gradually adjust the degree of redundancy
downwards, targeting cr(e(c,โ)), in line with the trend
perceived from the data held within the observation window.
The safety margin can therefore intuitively be seen as the
maximum aberration in terms of additional disturbances that
the scheme can tolerate as compared with the observed trend.
The number ru expresses the maximum number of successive
voting round completions that failed to meet the criterion of
success as defined hereabove, before responding by
considering the use of additional redundancy. Such scenario
includes (1) voting rounds for which a result o could be
(c, โ )
adjuducated, yet the cardinality c max of the corresponding
consensus block [o] was insufficient to math the agreed safety
margin csm, and (2) rounds for which the decision algorithm
failed to adjudicate a result โ a case the model will endure at
most rf times within the observation window in spite of undershooting the required amount of redundancy. At risk of
prolonging the scheme's unavailability, temporarily refraining
from increasing the employed degree of redundancy after
observing a potentially hazardous situation might allow the
replica selection model to regain the scheme's intended
dependability by substituting poorly performing versions by
more reliable, idling versions. Smaller values ru would enable
a more rapid detection of potentially hazardous situations,
resulting in system resources to be allocated rather lavishly.
The data structure shown in Fig. 2 is updated upon each
subsequent completion y of a voting round, as represented by
the corresponding element in {yz}c. Each column refers to a
single voting round โ = bc(y) and holds information regarding
the redundancy n(c,โ) employed and its effectiveness to
counterbalance the disturbances to which it was perceived to
be subject โ information that can be deduced from the
( C ,โ )
generated partition โ . Information pertaining to round
bc( y z โ r d ) will be discarded for values z > rd (see left in Fig. 2).
4.3. Replica Selection Model
Having established the degree of redundancy n(c,โ) to be
employed throughout round (c,โ), the replica selection model
will then determine an adequate selection of versions V(c,โ) to
be used by the redundancy scheme. The proposed model has
been designed so as to achieve an optimal trade-off between
dependability as well as performance-related objectives such
as load balancing and timeliness. To do so, a score is
computed for all versions v โ V depending on the contextual
information accrued during previously completed voting
rounds. For more information on this procedure, we refer to
4.2.2. Window Semantics
We will now elaborate on the procedure used to determine the
amount of redundancy to be used throughout a newly
initialised voting round (c,โ). In doing so, we use the abstract
4
5. its initial announcement in [3], which also introduces a
valuable metric named normalised dissent for approximating
the reliability of versions v โ V. Moreover, it aims to mitigate
the adverse effects of employing inapt resources that
consistently perform poorly and that, consequentially, may
threaten the effectiveness of the overall redundancy scheme.
If the degree of redundancy n(c,โ) to be utilised follows a
constant or decreasing trend, then, depending on the availability of eligible versions that can be used as a substitute, the
model will be successful in excluding such inapt replicas.
discrete-event simulation toolbox and its failure injection
mechanisms that were developed in [2]. Even though the
toolbox provides a multitude of failure injection mechanisms,
a simplistic, trend-based injection mechanism was chosen for
visualisation purposes, as illustrated in the upper graph in
Fig. 3. The total number of randomly chosen versions v โ V
for which the system will inject failures in the course of the
corresponding voting round (c,โ) is represented by the blue
marks, whereas the green marks indicate how many of those
affected versions were selected for participation in V(c,โ) โ V.
Injected failures are set to materialise as EVF and RVF
content failures with a 20, respectively 80% probability. The
discrepant response values returned for invocations affected
by RVF failures are sampled from a uniform distribution [3].
Replicas are selected using the model referred to in Sect. 4.3
which has been configured to target sustained availability.
5. Parsimonious Resource Allocation
Building on the extensible and abstract concept of window
semantics, we will now briefly discuss two specific
implementations that will be used to validate the effectiveness
of the redundancy dimensioning model in Sect. 6.
A first strategy was originally published in [7]. Even though it
was applied on redundant data structures, it can readily be
reused for dynamically determining the redundancy level. It
assumes nmax = 9, and will only report an odd degree of
redundancy. Moreover, the redundancy scheme is initialised
such that it is capable of tolerating up to one failure, hence
ninit = nmin = 3. If the voting scheme failed to find consensus
amongst a majority of the replicas involved during the last
completed voting round, the model will increase the number
of redundant replicas to be used in the next voting round, to
the extent that fu(c,โ) = 2. Conversely, when the scheme was
able to produce an outcome with a given amount of redundancy for a certain amount rd of consecutive voting round
completions, a lower degree of redundancy shall be used for
the next voting round, with fd(c,โ) = 2. Note how this strategy
restricts the success criterion (SC) by requiring the same
amount of redundancy to be used in the observation interval.
Figure 3: Number of injected failures (above); evolution of
degree of redundancy and its effectiveness (below).
We now compare the effectiveness of four redundancy
dimensioning strategies. Each of these strategies is assumed
to be deployed within an A-NVP/MV scheme operating in an
environment in which the same set V of 20 versions had been
deployed and that exhibits identical failure behaviour as
modelled by the blue trend line in Fig. 3. At any time, no
more than four failures are assumed to affect the versions in
V(c,โ), a scenario that could successfully be overcome by a
static redundancy configuration with n = 9. Apart from such
classic strategy, we will evaluate the two strategies for
parsimonious resource allocation introduced in Sect. 5: the
former will be analysed without any safety margin applied,
the latter considering two distinct values csm:
A second strategy is a plain implementation of the mechanism
defined in Sect. 4.2.1. When the degree of redundancy n(c,โ-1)
is found to be overabundant, as can be perceived from the
success criterion (SC), the downscaling function fd(c,โ) will
try to adjust downwards by an amount of (c(c,โ x ) โ m (c,โ x ) ) โcsm,
max
with โx = bc(yz) the last completed round observed. Undershooting the safety margin will cause fu(c,โ) to return
csm โ | c(c,โ x ) โ m (c,โ x ) | , with โx the round in the observation
max
window delimited by ru exhibiting the largest deviation from
the imposed security margin. The upscaling function fu(c,โ)
(c,โ )
(c, โ )
will try to inflate n(c,โ-1) by | c maxx โ m x | + csm in case of
redundancy undershooting, with โx the eligible round with the
smallest degree of consent.
strategy
A: Classic
B: First
C: Second
D: Second
6. Effectiveness Analysis
csm
0
1
2
rd
20
20
20
#โF
0
5
4
0
ttf
โ
47
47
โ
โcr(c,โ)
724
614
664
716
โn(c,โ)
1800
1358
1482
1699
ratio
2.486
2.211
2.232
2.373
Table 1: Overall resource consumption and ineffectiveness.
Number of voting rounds with scheme failure #โF.
Given the availability of spare system resources, our redundancy dimensioning model is indubitably capable of scaling
up the employed degree of redundancy, either in response to a
failure of the scheme, or as a precautionary measure if the
effectiveness of the employed redundancy is observed to
deteriorate โ cf. Sect. 4.2.2. In this section, we will analyse
whether the proposed model can effectively and safely reduce
the employed redundancy. In doing so, we have used the
So as to make a fair comparison, the application-specific
parameters are configured with ninit = nmax = 9 and nmin = 3.
For the sake of simplicity, it is assumed that the order in
which voting rounds are initialised is the order in which they
complete, i.e. z = x for all โx = bc(yz), and that all context
information is available by the time the next round is
initialised. This assumption improves comprehension and
5
6. allows the reader to grasp the intuition behind the harvested
contextual information c(c,โ ) โ m (c,โ ) (represented by the circles
max
in the lower graph in Fig. 3) driving the adjustment of the
employed redundancy. The value rd = 20 has been chosen for
demonstration purposes so as to enable the modelโs
effectiveness to be concisely captured (e.g. strategy C in
Fig. 3). At the expense of an aggressive allocation strategy,
we set ru = 1 such that the availability of the scheme can
swiftly be regained in case of redundancy undershooting.
technology for implementing fault containment units. A
prototypical service-oriented implementation of the replica
selection algorithm showed how WS-* specifications can be
leveraged to sustain adaptive redundancy management,
broadening the applicability of NVP schemata by increased
interoperability [3]. This A-NVP/MV prototype was extended
to include the proposed redundancy dimensioning model and
policies defined in Sect. 4 and 5.
7. Conclusion
For each voting round โ in the sequence {โx}c, we will
consider e(c,โ), that is, the number of versions vi โ V(c,โ) that
are affected by some type of content failure. Recall that the
green crosses in the upper graph of Fig. 3 essentially quantify
e(c,โ). We can then determine the contextual redundancy
cr(e(c,โ)) and compare this value with n(c,โ) to observe if the
amount of redundancy n(c,โ) actually used was overabundant
or insufficient โ cf. Sect. 4.2. As shown in Table 1, the
relevant cumulative (contextual) redundancy during the
operational life span of the scheme provides vital information
regarding the effectiveness of a redundancy dimensioning
model in estimating the extent to which it can economise on
resource expenditure. One cannot merely judge a dependability strategy in terms of cumulative contextual redundancy;
though sharing identical failure behaviour (blue trend), the
actual number of injected disturbances (green marks) for any
specific voting round is eventually determined by the amount
of redundancy that is actually used โ hence the variation in
the values reported. Instead, one had better considered the
ratio of total cumulative redundancy over cumulative contextual redundancy, for which the actual number of disturbances
is indirectly accounted for. Though smaller, positive values
are indicative of an increased level of parsimony, one should
consider whether this reduction in resource allocation did not
violate the dependability objective. In this respect, strategy B
clearly performs worse than A, C and D despite the smallest
degree of resource consumption, as the scheme experienced a
failure in 5 of the 200 simulated voting rounds. This
behaviour could have been anticipated, as strategy A responds
to actual failures rather than proactively adjusting the
redundancy upwards were the agreed safety margin violated.
Observe the significant improvement with respect to the
number of scheme failures resulting from a modest increase
of the imposed safety margin โ cf. strategies C and D.
In this paper, we have enhanced the redundancy management
facilities of our A-NVP/MV dependability strategy, which
had initially appeared in [3]. The proposed redundancy
dimensioning model aims to autonomously tune the employed
degree of redundancy in view of encountered disturbances,
and is fitted with accompanying policies intent upon
increasing the schemeโs cost effectiveness by parsimoniously
allocating system resources. Based on the discrete-event
simulation models presented in [2] with special attention paid
to the impact of disturbances that manifest as transient
failures in the content domain, these policies haven proven to
be effective in identifying situations necessitating an
adjustment of the redundancy level. It is apparent from our
experimentation that the suggested solution can effectively
achieve a substantial improvement in dependability,
compared to traditional, static redundancy strategies. Furthermore, tuning the adopted degree of redundancy to the actual
observed disturbances allows unnecessary resource expenditure to be reduced, therefore enhancing cost-effectiveness.
As future work, we intend to analyse the impact of the
defined parameters and investigate whether the system could
tolerate graver whimsicality of its deployment environment
by dynamically altering the parameters rd, ru and rf, and how
this may affect the effectiveness of the presented policies.
Moreover, we will renounce the simplification of in-order
voting round completions and analyse the effects of variations
in the individual response times of versions.
References
[1] M. Aigner. โCombinatorial Theoryโ, Springer (1997).
[2] J. Buys et al. โA Discrete-Event Model for Failure
Injection in Distributed NVPโ, technical report (2012).
[3] J. Buys et al. โTowards Context-Aware Adaptive Fault
Tolerance in SOA Applicationsโ, Proceedings of the 5th
ACM International Conference on Distributed EventBased Systems, pp. 63โ74 (2011).
[4] F. Cristian. โUnderstanding Fault-Tolerant Distributed
Systemsโ, Communications of the ACM, 34(2),
pp. 56-78 (1991).
[5] B. W. Johnson. โDesign and analysis of fault tolerant
digital systemsโ, Addison-Wesley Publishing (1989).
[6] V. De Florio. โApplication-layer Fault-toleranceโ, IGI
Global (2009).
[7] V. De Florio. โSoftware Assumptions Failure Tolerance:
Role, Strategies and Visionsโ, Architecting Dependable
Systems, LNCS 6420, pp. 249โ272 (2011).
In general, one may expect that a properly chosen value csm
matching the variability of the environment and the ensuing
disturbances aids in intercepting the trend and may lead to a
reduction of scheme failures due to more efficient proactive
upscaling. As can be seen from Fig. 3, a modest value csm = 1
enables the model to adjust the employed degree of
redundancy such that it follows the trend of failures โ cf. the
blue trend in the graph above with the employed redundancy
represented by the black marks in the lower graph.
7. Service-Oriented Prototype
The use of stateless web services and the confinement of any
communication to take place via explicitly defined service
interfaces appear to suggest that web services are an adequate
6