This document discusses simulating a pervasive software system to predict its reliability. It begins by introducing pervasive computing and its importance. It then describes the reference architecture model, which includes a smart environment conceptual model, smart object model, and pervasive system architecture model. The document outlines the simulation approach, including scenarios, specifications of simulation entities, assumptions, and extreme assumptions. The simulation aims to predict reliability and availability under different scenarios and control variable values. Key metrics like MTBF and MTTR will be measured to calculate reliability and availability.
Simulation predicts reliability of pervasive software
1. A SIMULATION APPROACH TO PREDICATE
THE RELIABILITY OF A PERVASIVE
SOFTWARE SYSTEM
By
Osama Mabrouk Khaled, PhD
2. What is Pervasive Computing?
β’ 3rd Computing Paradigm
β’ Distributed Computing,
Mobile Computing,
Embedded Systems,
Autonomic and
Computing HCI shaped it
β’ Known by
β’ Sensors
β’ Actuators
β’ Mobility
β’ Context-Awareness
β’ Adaptability
Introduction
3. - By 2020, the number of objects
connected to the Internet will be
about 50 billion
- Cisco expects the economy to
reach $14.4 billion by 2025
- 40% out of 450 surveyed IT and
business leaders expect that
pervasive computing will boost
sales and cut cost within 3
years
- Huge data gathering with
accuracy
- Software enabled-devices are
easier to manage
Refs: Cisco, Gartner
Introduction
Importance of Pervasive Computing
4. What is A Reference Architecture?
Introduction
Reference Architecture
Best Practices
Guidelines
Models
Documentation
Practical
experience
Patterns
5. Evaluation
Approach
Will it be a good technical architecture as expected during the runtime trials?
A simulation prototype was implemented to realize the implementation of the modules
and predict the reliability of the model at runtime.
Simulation vs Prototype
6. EvaluationApproach
Our Approach from a software engineering perspective
Business Analysis (B) Design (D) Evaluation (E)
Phase
Business Requirements Model
Categorized per quality feature
and per business domain
Architectural Requirements
Model Categorized per
quality feature
Smart Object
Essential Handlers
Smart Environment
Conceptual Model
Pervasive System
Abstraction
Trace architecture
baseline model to
requirements
Traceability Matrix
Generate
metrics
measurements
Measurements
For architecture static
Quality features
Run survey to collect
feedback about the quality
of the business and the
architecture reference
architectures
Measurements for
Quantitative quality
features
Benchmark
architecture with
expertsβ models
Comparison Results
Build a
simulation
project
Prediction for runtime
Reliability and availability
Recommend
Enhancements for
the reference
architecture
7. Related Work β State of the Art (5)
Related Work
β’ Subjective, quantitative,
traceability evaluation methods.
β’ Mixed usage of the different
evaluation methods is rarely
used.
Evaluation Methods
8. Technical Model β Baseline Architectural Model
Technical Reference Architecture
The model
provides the
essential
details
about:
The Smart Environment: a conceptual view of the SE and
classification of the objects.
The Smart Object: an abstracted view of the SO and the
essential handlers that it should include to interact with the SE.
The Pervasive System: The essential modules that should exist
in a PS with high level linkage among them.
The System Optimization: a reference for the basic optimization
parameters in the system.
The Architecture Variability: the essential configurations of the
PervCompRA-SE to generate different architectural models based
on the changing rules.
The System Deployment: The essential deployment strategies
that could be implemented for a PS in order to increase its
reliability.
10. Smart Object
Safety
Processing
Power Status
Community
Statistics
Programming
Permissions
Process
Hosting
Volatility
Status
Security &
Privacy
Technical Model - Smart Object
Technical Reference Architecture
The SO can run in different
modes:
1. Runtime: where all
handlers run with full
capacity and with minimal
overhead.
2. Diagnostics: the SO adds
extra overhead to its
handlers, like logging,
memory dump, etc.
3. Maintenance: the SO is in
maintenance mode, which
means that some of its
functions may not be
available. For example, its
network interface may be
disabled, or the handlers
that will be disabled will
notify the callers that it is in
maintenance mode.
11. Technical Model β Pervasive System Architecture Model
Technical Reference Architecture
Static Model
Baseline Architectural Model
Synthesizer
Interested
Community
Feedin Feedback
Analytics
Manager
Application
Solution
n
Solution
n-1
β¦β¦β¦β¦β¦β¦..
Solution
n-2
Solution
1
Solution
2
Intelligence and
Reasoning
EventHandler
InterpretationManager
DecisionManager
Environment Care
Profile
Manager
Risk
Handler
System Organization
Optimization
Manager
Device
Manager
Resource
Manager
Service
Manager
Common Infrastructure
Repository Manager
LoggerFault Handler Policy Manager
Input
Device
Implict
ExplicitPhysical Virtual
Output
Device Visible Invisible
13. A
B
accident
breakdown
Normal
Baseline Architecture
Early Morning Midday Night
Visitors
Hospital
Alarm Board
Police Alarm
Board
SMS Engine
Simulation - StoryEvaluation
Main Objective
1. Predicts reliability and availability.
2. It isolates the internal details of the
technical model from the external
factors like network, hardware, and
programming language through
controlled assumptions.
3. Insights about additional design
decisions
4. Insights about risk factors.
5. It is important to predict the systemβs
reliability under the best, average,
and worse values for the variables.
6. It is essential to understand the
entities that satisfy the fault tolerance
quality feature.
7. It guides the architect on how to
generate statistics about the reliability
and the availability of the system
modules.
8. Finally, It is one of the standard
methods in our research whereby the
reference architecture must have a
prototype implementation.
14. Simulation - Conceptual
Model
Evaluation
stm Module Phase
Inactive
Initial
Active
FailedResumed
[stop]
[1 sigma][stop] [fail]
[start]
[fail]
[fix]
Phase: State in {Active, Inactive, Failed,
Resumed}.
Accumulated Inputs (AI): it is the number
of received input requests for all the input
ports π΄πΌ = πππ’ππ‘ (ππππ’π‘
π
π=0 π
).
Accumulated Outputs (AO): it is the
number of submitted outputs for all the
output ports π΄π = πππ’ππ‘ (ππ’π‘ππ’π‘
π
π=0 π
).
Lifetime (L): is the lifetime indicator of the
entity which takes a value from 0-100. 100
indicates that it is healthy and fully
powered, and 0 indicates that it is dead. It
is an optional state attribute for part
objects.
Failures (F): it is the counter of non-
accumulated failures. It is ceiled by a
maximum threshold. The counter will reset
to 0 after reaching the threshold. It is an
optional state attribute for active objects.
Mode (M): It is the mode of the system
where M in {Runtime, Assertion, Out of
Service, Security Threat}.
16. Simulation - SpecificationsEvaluation
1. Simulation Starter: It starts, stops, changes executing, and dumps statistics about the simulation runs.
2. Speed Sensor: Sends a random number about the speed of the bus during the trip.
3. Location Sensor: Sends the location of the bus during the trip.
4. Location Sensor Synthesizer: It receives the input from the Location Sensor and generates a synthesized
value based on the input value and the error standard deviation.
5. Crash Sensor Synthesizer: This entity receives the input from the Speed Sensor and generates a
synthesized value based on the input value and the error deviation.
Equation β6-5 Simulation
Synthesizer formula
Synthesized data = data + Gaussian Random number * error deviation
6. Repository Manager: The main responsibility of the Repository Manager is to record data coming from
the synthesizers and saves them in a 3-tuple format.
7. Event Handler: The Event Handler is responsible for fetching the 3-tuple raw data, converts them into a
readable 3-tuple context and sends it to the Interpretation Manager.
8. Interpretation Manager: The Interpretation Manager is responsible for converting the 3-tuple context into
a meaningful interpretation.
9. Decision Manager: the Decision Manager is responsible for making rationale understanding of the
interpretation in order to take the right decision.
10. SMS Engine: The SMS Engine is responsible for delivering SMS messages for an individual cell phone.
11. Hospital Alarm Board: It is a virtual digital screen that shows alarm messages in case of accidents.
12. Police Alarm Board: It is a virtual digital screen that shows alarm messages to the Police department in
case of bus breakdown or accidents.
13. Profile Manager: The Profile Manager is responsible for fetching the user profiles from the Repository
Manager and sending them to the SMS Engine.
17. Simulation - SpecificationsEvaluation
14. Fault Handler: the Fault Manager is responsible for handling faults that cause part objects to be out of
service. It is important to note that the probability of part object failure increases based on its complexity:
15. Optimization Manager: The Optimization Manager is responsible for monitoring some health
performance indicators for the sensors, actuators, and the part objects and takes decisions to recover their
performance.
16. Resource Manager: The Resource Manager receives a request from the Optimization Manager to
allocate a resource for a nominated part object, or sensor.
17. Service Manager: The Service Manager is responsible for handling requests from the smart objects to get
some services from the system.
18. Device Manager: The Device Manager is responsible for handling the smart objectβs join request.
19. Risk Handler: The Risk Handler is responsible for studying the requests from the smart objects to join the
system and puts it on the proper status (visiting, trusted, prohibited, or rejected) as well as certificate
requests.
20. Policy Manager: The Policy Manager is responsible for enforcing the system policy according to the
mode of the system.
21. Analytics Manager: The Analytics Manager periodically sends details of the 3-tuple context events to the
Interested Community.
22. Logger: The Logger logs part objectsβ log statements.
23. Interested Community: The Interested Community is a representation of a cloud or external system
where the Analytics Manager sends it statistics about the system.
24. Smart Object: The Smart Object module generates random visits/disjoins/service requests/certificate
requests to the system.
Module complexity
weight formula
π€πππβπ‘ = π ππ’ππ(
π β π
ππ β ππ
15
π=0
β 100)
where
ο· r = the number of satisfied requirements by the part object.
ο· d = the number of input and output dependency relationships for the part object.
18. Simulation - AssumptionsEvaluation
Notation Control Variable
Value Boundary
Average
(A)
Best
(B)
Worse
(W)
Comment
SSfr Speed Sensor signal failure rate 9.167E-09 1.67E-10 1.67E-09 This is a very negligible error
SSbl Speed Sensor battery lifetime
degradation /minute
0.001 0.0009 0.002
LSsa Location Sensor signal accuracy
(per meter)
2.25 2 2.5 Estimate based on the horizontal
position accuracy of the sensor
LSbl Location Sensor battery lifetime
degradation (per minute)
0.003 0.0009 0.002
BRthr Battery Recharge Threshold (%) 0.4 0.5 0.2 Used by the Optimization Manager
POthr Part Object Failure Optimization
Threshold
2 1 3 Used by the Optimization Manager
to minimize failures of part objects
SMSfr SMS Engine Failure rate 0.05 0.1 0.16
SMSrr SMS Engine Repair rate 0.95 0.9 0.84 It is the complement of the SMS
Engine failure rate.
HABfr Hospital Alarm Board Failure rate 0.05 0.025 0.075
HABrr Hospital Alarm Board Repair rate 0.95 0.975 0.925 It is the complement of the Hospital
Alarm Failure rate
PABfr Police Alarm Board Failure rate 0.05 0.025 0.075
PABrr Police Alarm Board Repair rate 0.95 0.975 0.925 It is the complement of the Police
Alarm Failure rate
RMr Runtime Mode Rate 0.67 0.7 0.64
POfr Part Object Failure Rate 0.275 0.05 0.5
POrr Part Object Repair Rate 0.725 0.95 0.5
ACr Accident Rate 0.004 0 0.03 estimated from the fatality rates
starting from 2012 till 2015
19. Simulation - AssumptionsEvaluation
Visits peak is in the morning
Disjoin peak is in the evening
Service request peak is in midday
0 (accident) οο 21 (Normal)
System mode is 64.2% at runtime
20. Simulation - Extreme AssumptionsEvaluation
Notation Control Variable Ext. Best (EB) Ext. Worse (EW)
SSfr Speed Sensor signal failure rate 0 4.10684E-09
SSbl Speed Sensor battery lifetime degradation /minute 0 0.004
LSsa Location Sensor signal accuracy (per meter) 1.2 3.3
LSbl Location Sensor battery lifetime degradation (per minute) 0 0.004
BRthr Battery Recharge Threshold (%) 0.98 0
POthr Part Object Failure Optimization Threshold 1 6
SMSfr SMS Engine Failure rate 0 0.403
SMSrr SMS Engine Repair rate 0.997 0.743
HABfr Hospital Alarm Board Failure rate 0 0.156
HABrr Hospital Alarm Board Repair rate 1 0.844
PABfr Police Alarm Board Failure rate 0 0.156
PABrr Police Alarm Board Repair rate 1 0.844
RMr Runtime Mode Rate 0.78 0.543
POfr Part Object Failure Rate 0.001 0.999
POrr Part Object Repair Rate () 0.999 0.001
ACr Accident Rate 0 0.079
We stretched the values of the control variables based
on the average of the (best, average, and worse)
values
β’ We calculate the minimum value, whether it is best
or worst, as ( πππ = ππ£πππππ β 3 β π).
β’ We calculate the maximum value, whether it is
best or worst, as (πππ₯ = ππ£πππππ + 3 β π).
22. Simulation - ScenariosEvaluation
ID Name Hybrid
mode
Control variable
group
resources optimization faults runs ticks
1 Perfect False Perfect N/A False False 3 1500
2 Normal (runtime mode
only)
False Average 12 True True 3 1500
3 Normal (runtime mode
only)
False Best 12 True True 3 1500
4 Normal (runtime mode
only)
False Worst 12 True True 3 1500
5 Normal (Hybrid modes) True Average 12 True True 3 1500
6 Normal (Hybrid modes) True Best 12 True True 3 1500
7 Normal (Hybrid modes) True Worst 12 True True 3 1500
8 Normal (No optimization) True Average N/A False True 3 1500
9 Normal (No optimization) True Best N/A False True 3 1500
10 Normal (No optimization) True Worst N/A False True 3 1500
11 Normal (resource
optimized)
True Average 4 True True 3 1500
12 Normal (resource
optimized)
True Average 8 True True 3 1500
13 Normal (resource
optimized)
True Average 12 True True 3 1500
14 Extreme True Extreme best 12 True True 3 1500
15 Extreme True Extreme worst 12 True True 3 1500
23. Simulation - Results
Reliability and Availability
Evaluation
Equation β6-9 System Reliability
and Availability Calculations
π πππππππππ‘π¦ =
πππ΅πΉ
πππ΅πΉ+1
π΄π£πππππππππ‘π¦ =
πππ΅πΉ
πππ΅πΉ+ππππ
Scenario MTBF MTTR Availability Reliability
Perfect 0 0 1 1
Extreme - ext_best - res 12 969.00 2.00 99.79% 99.90%
Normal (Hybrid modes) - Best - res 12 251.13 2.10 99.17% 99.60%
Normal (No optimization) - Best - res 12 230.12 2.05 99.12% 99.57%
Normal (runtime mode only) - Best - res 12 226.56 2.07 99.09% 99.56%
Normal (resource optimized) - Average - res 4 69.37 2.44 96.60% 98.58%
Normal (resource optimized) - Average - res 8 68.16 2.36 96.65% 98.55%
Normal (runtime mode only) - Average - res 12 62.05 2.54 96.06% 98.41%
Normal (Hybrid modes) - Average - res 12 60.87 2.46 96.11% 98.38%
Normal (resource optimized) - Average - res 12 59.69 2.39 96.16% 98.35%
Normal (No optimization) - Average - res 12 51.81 2.45 95.49% 98.11%
Normal (runtime mode only) - Worse - res 12 36.75 3.26 91.84% 97.35%
Normal (Hybrid modes) - Worse - res 12 36.41 3.14 92.06% 97.33%
Normal (No optimization) - Worse - res 12 30.85 3.09 90.89% 96.86%
Extreme - ext_worse - res 12 12.55 26.89 31.83% 92.62%
24. Simulation - Results
Processing Time
Evaluation
Scenario βT
Perfect - Perfect 0.00%
Normal (runtime mode only) - Best - res 12 0.03%
Normal (runtime mode only) - Average - res 12 0.16%
Normal (runtime mode only) - Worse - res 12 0.63%
Extreme - ext_best - res 12 0.99%
Normal (No optimization) - Best - res 12 1.88%
Normal (Hybrid modes) - Best - res 12 2.30%
Normal (Hybrid modes) - Average - res 12 2.36%
Normal (Hybrid modes) - Worse - res 12 2.36%
Normal (resource optimized) - Average - res 12 2.45%
Normal (No optimization) - Average - res 12 2.52%
Normal (resource optimized) - Average - res 4 2.60%
Normal (resource optimized) - Average - res 8 3.02%
Normal (No optimization) - Worse - res 12 3.55%
Extreme - ext_worse - res 12 N/A
Grand Average 14.00%
β’ We predict an average of 2% additional
time needed from the last sensor input
calculated against the perfect scenario.
β’ The results show that the resource
optimization technique that we adopted
is working reasonably.
β’ The scenarios show that the processing
time increases as the working
conditions get worse.
β’ The extreme worst scenario does not
show results because it did not
complete the whole journey because of
the repetitive failures.
25. Simulation - Results
Fault Tolerance
Evaluation
Scenario # Failures # Immunity
Extreme - ext_best - res 12 144.00 0.00
Extreme - ext_worse - res 12 N/A N/A
Normal (resource optimized) - Average - res 12 897.67 31.00
Normal (resource optimized) - Average - res 4 679.00 20.00
Normal (resource optimized) - Average - res 8 820.33 31.33
Normal (Hybrid modes) - Average - res 12 857.00 24.00
Normal (Hybrid modes) - Best - res 12 740.33 2.33
Normal (Hybrid modes) - Worse - res 12 1115.33 57.33
Normal (No optimization) - Average - res 12 N/A N/A
Normal (No optimization) - Best - res 12 N/A N/A
Normal (No optimization) - Worse - res 12 N/A N/A
Normal (runtime mode only) - Average - res 12 880.00 33.00
Normal (runtime mode only) - Best - res 12 758.67 2.67
Normal (runtime mode only) - Worse - res 12 1158.00 61.00
Perfect - Perfect - res 4 N/A N/A
The experiments show an average of 3.09% immunity from failures across
all the scenarios. As the resources allocated increase, the immunity
provided to the system increases as well. The scenarios that have N/A did
not apply the optimization technique.
26. 1. The experiments predict the reliability of the architecture in
the worst case as 96.86% and the availability as 90.89%.
2. In the extreme worst cases both reliability and availability
measurements decrease noticeably as reliability becomes
92.62% and availability deteriorates to 31.83%.
3. On average the system availability is 95.77% and reliability is
98.08% if we exclude the Perfect and extreme cases.
4. In the best cases, the system availability is 99.79% and
reliability is 99.9%.
Reflection on Architecture Decisions
Resource Optimization technique prevented 3.1% of failures
across all the scenarios. The percentage increases as
allocated resources increases
Findings
27. 1. Simulation gives an initial prediction which can change at
runtime.
2. Build a simulation package for the TRA with dynamic
configurations for the control variables.
It will be great if software engineering community can provide
anonymous statistics to use for simulation experiments
Conclusion
Future Work