Simulation predicts reliability of pervasive software

A SIMULATION APPROACH TO PREDICATE
THE RELIABILITY OF A PERVASIVE
SOFTWARE SYSTEM
By
Osama Mabrouk Khaled, PhD

What is Pervasive Computing?
• 3rd Computing Paradigm
• Distributed Computing,
Mobile Computing,
Embedded Systems,
Autonomic and
Computing HCI shaped it
• Known by
• Sensors
• Actuators
• Mobility
• Context-Awareness
• Adaptability
Introduction

- By 2020, the number of objects
connected to the Internet will be
about 50 billion
- Cisco expects the economy to
reach $14.4 billion by 2025
- 40% out of 450 surveyed IT and
business leaders expect that
pervasive computing will boost
sales and cut cost within 3
years
- Huge data gathering with
accuracy
- Software enabled-devices are
easier to manage
Refs: Cisco, Gartner
Introduction
Importance of Pervasive Computing

What is A Reference Architecture?
Introduction
Reference Architecture
Best Practices
Guidelines
Models
Documentation
Practical
experience
Patterns

Evaluation
Approach
Will it be a good technical architecture as expected during the runtime trials?
A simulation prototype was implemented to realize the implementation of the modules
and predict the reliability of the model at runtime.
Simulation vs Prototype

EvaluationApproach
Our Approach from a software engineering perspective
Business Analysis (B) Design (D) Evaluation (E)
Phase
Business Requirements Model
Categorized per quality feature
and per business domain
Architectural Requirements
Model Categorized per
quality feature
Smart Object
Essential Handlers
Smart Environment
Conceptual Model
Pervasive System
Abstraction
Trace architecture
baseline model to
requirements
Traceability Matrix
Generate
metrics
measurements
Measurements
For architecture static
Quality features
Run survey to collect
feedback about the quality
of the business and the
architecture reference
architectures
Measurements for
Quantitative quality
features
Benchmark
architecture with
experts’ models
Comparison Results
Build a
simulation
project
Prediction for runtime
Reliability and availability
Recommend
Enhancements for
the reference
architecture

Related Work – State of the Art (5)
Related Work
• Subjective, quantitative,
traceability evaluation methods.
• Mixed usage of the different
evaluation methods is rarely
used.
Evaluation Methods

Technical Model – Baseline Architectural Model
Technical Reference Architecture
The model
provides the
essential
details
about:
The Smart Environment: a conceptual view of the SE and
classification of the objects.
The Smart Object: an abstracted view of the SO and the
essential handlers that it should include to interact with the SE.
The Pervasive System: The essential modules that should exist
in a PS with high level linkage among them.
The System Optimization: a reference for the basic optimization
parameters in the system.
The Architecture Variability: the essential configurations of the
PervCompRA-SE to generate different architectural models based
on the changing rules.
The System Deployment: The essential deployment strategies
that could be implemented for a PS in order to increase its
reliability.

Technical Model – Smart Environment
(Conceptual Model)

Smart Object
Safety
Processing
Power Status
Community
Statistics
Programming
Permissions
Process
Hosting
Volatility
Status
Security &
Privacy
Technical Model - Smart Object
The SO can run in different
modes:
1. Runtime: where all
handlers run with full
capacity and with minimal
overhead.
2. Diagnostics: the SO adds
extra overhead to its
handlers, like logging,
memory dump, etc.
3. Maintenance: the SO is in
maintenance mode, which
means that some of its
functions may not be
available. For example, its
network interface may be
disabled, or the handlers
that will be disabled will
notify the callers that it is in
maintenance mode.

Technical Model – Pervasive System Architecture Model
Static Model
Baseline Architectural Model
Synthesizer
Interested
Community
Feedin Feedback
Analytics
Manager
Application
Solution
n
Solution
n-1
………………..
Solution
n-2
Solution
1
Solution
2
Intelligence and
Reasoning
EventHandler
InterpretationManager
DecisionManager
Environment Care
Profile
Manager
Risk
Handler
System Organization
Optimization
Manager
Device
Manager
Resource
Manager
Service
Manager
Common Infrastructure
Repository Manager
LoggerFault Handler Policy Manager
Input
Device
Implict
ExplicitPhysical Virtual
Output
Device Visible Invisible

Input Data
Event
Event
Event
Context
Context
Context
Context
Context
Interpretation
%
ContextContext
Decision
%
Context
Context
Action
%
Zero or more One or more % Probability attached
Output feedback
Technical Model – Pervasive
System Architecture Model
Core
Behaviour
Model

A
B
accident
breakdown
Normal
Baseline Architecture
Early Morning Midday Night
Visitors
Hospital
Alarm Board
Police Alarm
Board
SMS Engine
Simulation - StoryEvaluation
Main Objective
1. Predicts reliability and availability.
2. It isolates the internal details of the
technical model from the external
factors like network, hardware, and
programming language through
controlled assumptions.
3. Insights about additional design
decisions
4. Insights about risk factors.
5. It is important to predict the system’s
reliability under the best, average,
and worse values for the variables.
6. It is essential to understand the
entities that satisfy the fault tolerance
quality feature.
7. It guides the architect on how to
generate statistics about the reliability
and the availability of the system
modules.
8. Finally, It is one of the standard
methods in our research whereby the
reference architecture must have a
prototype implementation.

Simulation - Conceptual
Model
Evaluation
stm Module Phase
Inactive
Initial
Active
FailedResumed
[stop]
[1 sigma][stop] [fail]
[start]
[fail]
[fix]
Phase: State in {Active, Inactive, Failed,
Resumed}.
Accumulated Inputs (AI): it is the number
of received input requests for all the input
ports 𝐴𝐼 = 𝑐𝑜𝑢𝑛𝑡 (𝑖𝑛𝑝𝑢𝑡
𝑛
𝑖=0 𝑖
).
Accumulated Outputs (AO): it is the
number of submitted outputs for all the
output ports 𝐴𝑂 = 𝑐𝑜𝑢𝑛𝑡 (𝑜𝑢𝑡𝑝𝑢𝑡
𝑛
𝑖=0 𝑖
).
Lifetime (L): is the lifetime indicator of the
entity which takes a value from 0-100. 100
indicates that it is healthy and fully
powered, and 0 indicates that it is dead. It
is an optional state attribute for part
objects.
Failures (F): it is the counter of non-
accumulated failures. It is ceiled by a
maximum threshold. The counter will reset
to 0 after reaching the threshold. It is an
optional state attribute for active objects.
Mode (M): It is the mode of the system
where M in {Runtime, Assertion, Out of
Service, Security Threat}.

Simulation - SpecificationsEvaluation
“DEVS-Suite,” Arizona Center for Integrative Modeling & Simulation, 2.0.

1. Simulation Starter: It starts, stops, changes executing, and dumps statistics about the simulation runs.
2. Speed Sensor: Sends a random number about the speed of the bus during the trip.
3. Location Sensor: Sends the location of the bus during the trip.
4. Location Sensor Synthesizer: It receives the input from the Location Sensor and generates a synthesized
value based on the input value and the error standard deviation.
5. Crash Sensor Synthesizer: This entity receives the input from the Speed Sensor and generates a
synthesized value based on the input value and the error deviation.
Equation ‎6-5 Simulation
Synthesizer formula
Synthesized data = data + Gaussian Random number * error deviation
6. Repository Manager: The main responsibility of the Repository Manager is to record data coming from
the synthesizers and saves them in a 3-tuple format.
7. Event Handler: The Event Handler is responsible for fetching the 3-tuple raw data, converts them into a
readable 3-tuple context and sends it to the Interpretation Manager.
8. Interpretation Manager: The Interpretation Manager is responsible for converting the 3-tuple context into
a meaningful interpretation.
9. Decision Manager: the Decision Manager is responsible for making rationale understanding of the
interpretation in order to take the right decision.
10. SMS Engine: The SMS Engine is responsible for delivering SMS messages for an individual cell phone.
11. Hospital Alarm Board: It is a virtual digital screen that shows alarm messages in case of accidents.
12. Police Alarm Board: It is a virtual digital screen that shows alarm messages to the Police department in
case of bus breakdown or accidents.
13. Profile Manager: The Profile Manager is responsible for fetching the user profiles from the Repository
Manager and sending them to the SMS Engine.

14. Fault Handler: the Fault Manager is responsible for handling faults that cause part objects to be out of
service. It is important to note that the probability of part object failure increases based on its complexity:
15. Optimization Manager: The Optimization Manager is responsible for monitoring some health
performance indicators for the sensors, actuators, and the part objects and takes decisions to recover their
performance.
16. Resource Manager: The Resource Manager receives a request from the Optimization Manager to
allocate a resource for a nominated part object, or sensor.
17. Service Manager: The Service Manager is responsible for handling requests from the smart objects to get
some services from the system.
18. Device Manager: The Device Manager is responsible for handling the smart object’s join request.
19. Risk Handler: The Risk Handler is responsible for studying the requests from the smart objects to join the
system and puts it on the proper status (visiting, trusted, prohibited, or rejected) as well as certificate
requests.
20. Policy Manager: The Policy Manager is responsible for enforcing the system policy according to the
mode of the system.
21. Analytics Manager: The Analytics Manager periodically sends details of the 3-tuple context events to the
Interested Community.
22. Logger: The Logger logs part objects’ log statements.
23. Interested Community: The Interested Community is a representation of a cloud or external system
where the Analytics Manager sends it statistics about the system.
24. Smart Object: The Smart Object module generates random visits/disjoins/service requests/certificate
requests to the system.
Module complexity
weight formula
𝑤𝑒𝑖𝑔ℎ𝑡 = 𝑅𝑜𝑢𝑛𝑑(
𝑟 ∗ 𝑑
𝑟𝑖 ∗ 𝑑𝑖
15
𝑖=0
∗ 100)
where
 r = the number of satisfied requirements by the part object.
 d = the number of input and output dependency relationships for the part object.

Simulation - AssumptionsEvaluation
Notation Control Variable
Value Boundary
Average
(A)
Best
(B)
Worse
(W)
Comment
SSfr Speed Sensor signal failure rate 9.167E-09 1.67E-10 1.67E-09 This is a very negligible error
SSbl Speed Sensor battery lifetime
degradation /minute
0.001 0.0009 0.002
LSsa Location Sensor signal accuracy
(per meter)
2.25 2 2.5 Estimate based on the horizontal
position accuracy of the sensor
LSbl Location Sensor battery lifetime
degradation (per minute)
0.003 0.0009 0.002
BRthr Battery Recharge Threshold (%) 0.4 0.5 0.2 Used by the Optimization Manager
POthr Part Object Failure Optimization
Threshold
2 1 3 Used by the Optimization Manager
to minimize failures of part objects
SMSfr SMS Engine Failure rate 0.05 0.1 0.16
SMSrr SMS Engine Repair rate 0.95 0.9 0.84 It is the complement of the SMS
Engine failure rate.
HABfr Hospital Alarm Board Failure rate 0.05 0.025 0.075
HABrr Hospital Alarm Board Repair rate 0.95 0.975 0.925 It is the complement of the Hospital
Alarm Failure rate
PABfr Police Alarm Board Failure rate 0.05 0.025 0.075
PABrr Police Alarm Board Repair rate 0.95 0.975 0.925 It is the complement of the Police
Alarm Failure rate
RMr Runtime Mode Rate 0.67 0.7 0.64
POfr Part Object Failure Rate 0.275 0.05 0.5
POrr Part Object Repair Rate 0.725 0.95 0.5
ACr Accident Rate 0.004 0 0.03 estimated from the fatality rates
starting from 2012 till 2015

Simulation - AssumptionsEvaluation
Visits peak is in the morning
Disjoin peak is in the evening
Service request peak is in midday
0 (accident)  21 (Normal)
System mode is 64.2% at runtime

Simulation - Extreme AssumptionsEvaluation
Notation Control Variable Ext. Best (EB) Ext. Worse (EW)
SSfr Speed Sensor signal failure rate 0 4.10684E-09
SSbl Speed Sensor battery lifetime degradation /minute 0 0.004
LSsa Location Sensor signal accuracy (per meter) 1.2 3.3
LSbl Location Sensor battery lifetime degradation (per minute) 0 0.004
BRthr Battery Recharge Threshold (%) 0.98 0
POthr Part Object Failure Optimization Threshold 1 6
SMSfr SMS Engine Failure rate 0 0.403
SMSrr SMS Engine Repair rate 0.997 0.743
HABfr Hospital Alarm Board Failure rate 0 0.156
HABrr Hospital Alarm Board Repair rate 1 0.844
PABfr Police Alarm Board Failure rate 0 0.156
PABrr Police Alarm Board Repair rate 1 0.844
RMr Runtime Mode Rate 0.78 0.543
POfr Part Object Failure Rate 0.001 0.999
POrr Part Object Repair Rate () 0.999 0.001
ACr Accident Rate 0 0.079
We stretched the values of the control variables based
on the average of the (best, average, and worse)
values
• We calculate the minimum value, whether it is best
or worst, as ( 𝑚𝑖𝑛 = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 − 3 ∗ 𝜎).
• We calculate the maximum value, whether it is
best or worst, as (𝑚𝑎𝑥 = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 + 3 ∗ 𝜎).

Simulation - Scenarios
Evaluation

Simulation - ScenariosEvaluation
ID Name Hybrid
mode
Control variable
group
resources optimization faults runs ticks
1 Perfect False Perfect N/A False False 3 1500
2 Normal (runtime mode
only)
False Average 12 True True 3 1500
only)
False Best 12 True True 3 1500
only)
False Worst 12 True True 3 1500
5 Normal (Hybrid modes) True Average 12 True True 3 1500
6 Normal (Hybrid modes) True Best 12 True True 3 1500
7 Normal (Hybrid modes) True Worst 12 True True 3 1500
8 Normal (No optimization) True Average N/A False True 3 1500
9 Normal (No optimization) True Best N/A False True 3 1500
10 Normal (No optimization) True Worst N/A False True 3 1500
11 Normal (resource
optimized)
True Average 4 True True 3 1500
12 Normal (resource
optimized)
13 Normal (resource
optimized)
14 Extreme True Extreme best 12 True True 3 1500
15 Extreme True Extreme worst 12 True True 3 1500

Simulation - Results
Reliability and Availability
Evaluation
Equation ‎6-9 System Reliability
and Availability Calculations
𝑅𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 =
𝑀𝑇𝐵𝐹
𝑀𝑇𝐵𝐹+1
𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑖𝑙𝑖𝑡𝑦 =
𝑀𝑇𝐵𝐹
𝑀𝑇𝐵𝐹+𝑀𝑇𝑇𝑅
Scenario MTBF MTTR Availability Reliability
Perfect 0 0 1 1
Extreme - ext_best - res 12 969.00 2.00 99.79% 99.90%
Normal (Hybrid modes) - Best - res 12 251.13 2.10 99.17% 99.60%
Normal (No optimization) - Best - res 12 230.12 2.05 99.12% 99.57%
Normal (runtime mode only) - Best - res 12 226.56 2.07 99.09% 99.56%
Normal (resource optimized) - Average - res 4 69.37 2.44 96.60% 98.58%
Normal (runtime mode only) - Average - res 12 62.05 2.54 96.06% 98.41%
Normal (Hybrid modes) - Average - res 12 60.87 2.46 96.11% 98.38%
Normal (No optimization) - Average - res 12 51.81 2.45 95.49% 98.11%
Normal (runtime mode only) - Worse - res 12 36.75 3.26 91.84% 97.35%
Normal (Hybrid modes) - Worse - res 12 36.41 3.14 92.06% 97.33%
Normal (No optimization) - Worse - res 12 30.85 3.09 90.89% 96.86%
Extreme - ext_worse - res 12 12.55 26.89 31.83% 92.62%

Processing Time
Evaluation
Scenario ∆T
Perfect - Perfect 0.00%
Normal (runtime mode only) - Best - res 12 0.03%
Normal (runtime mode only) - Average - res 12 0.16%
Normal (runtime mode only) - Worse - res 12 0.63%
Extreme - ext_best - res 12 0.99%
Normal (No optimization) - Best - res 12 1.88%
Normal (Hybrid modes) - Best - res 12 2.30%
Normal (Hybrid modes) - Average - res 12 2.36%
Normal (Hybrid modes) - Worse - res 12 2.36%
Normal (resource optimized) - Average - res 12 2.45%
Normal (No optimization) - Average - res 12 2.52%
Normal (No optimization) - Worse - res 12 3.55%
Extreme - ext_worse - res 12 N/A
Grand Average 14.00%
• We predict an average of 2% additional
time needed from the last sensor input
calculated against the perfect scenario.
• The results show that the resource
optimization technique that we adopted
is working reasonably.
• The scenarios show that the processing
time increases as the working
conditions get worse.
• The extreme worst scenario does not
show results because it did not
complete the whole journey because of
the repetitive failures.

Fault Tolerance
Evaluation
Scenario # Failures # Immunity
Extreme - ext_best - res 12 144.00 0.00
Extreme - ext_worse - res 12 N/A N/A
Normal (resource optimized) - Average - res 12 897.67 31.00
Normal (Hybrid modes) - Average - res 12 857.00 24.00
Normal (Hybrid modes) - Best - res 12 740.33 2.33
Normal (Hybrid modes) - Worse - res 12 1115.33 57.33
Normal (No optimization) - Average - res 12 N/A N/A
Normal (No optimization) - Best - res 12 N/A N/A
Normal (No optimization) - Worse - res 12 N/A N/A
Normal (runtime mode only) - Average - res 12 880.00 33.00
Normal (runtime mode only) - Best - res 12 758.67 2.67
Normal (runtime mode only) - Worse - res 12 1158.00 61.00
Perfect - Perfect - res 4 N/A N/A
The experiments show an average of 3.09% immunity from failures across
all the scenarios. As the resources allocated increase, the immunity
provided to the system increases as well. The scenarios that have N/A did
not apply the optimization technique.

1. The experiments predict the reliability of the architecture in
the worst case as 96.86% and the availability as 90.89%.
2. In the extreme worst cases both reliability and availability
measurements decrease noticeably as reliability becomes
92.62% and availability deteriorates to 31.83%.
3. On average the system availability is 95.77% and reliability is
98.08% if we exclude the Perfect and extreme cases.
4. In the best cases, the system availability is 99.79% and
reliability is 99.9%.
Reflection on Architecture Decisions
Resource Optimization technique prevented 3.1% of failures
across all the scenarios. The percentage increases as
allocated resources increases
Findings

1. Simulation gives an initial prediction which can change at
runtime.
2. Build a simulation package for the TRA with dynamic
configurations for the control variables.
It will be great if software engineering community can provide
anonymous statistics to use for simulation experiments
Conclusion
Future Work

THANKS!
Any questions?
You can find me at
• okhaled@aucegypt.edu

Simulation predicts reliability of pervasive software

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Simulation predicts reliability of pervasive software

Similar to Simulation predicts reliability of pervasive software (20)

More from Osama M. Khaled

More from Osama M. Khaled (8)

Recently uploaded

Recently uploaded (20)

Simulation predicts reliability of pervasive software