The document discusses safety and security challenges with mission critical internet of things (IoT) systems. It notes that as more critical infrastructure comes to rely on software-controlled "things", ensuring trustworthy decision making is important. A case study describes a proposed "Driller's Buddy" system to better support human operators in drilling operations by providing recommendations, awareness of uncertainties, and optimized actions. Developing such systems raises issues like common mode failures, malware risks, and balancing interests across disciplines. Architecture-centric systems engineering using standards and evidence-based practices can help address these challenges.
5. Human brain - planets most sophisticated
and vulnerable decision maker
the weakest point
• Emotions trumps facts (irrationality)
• Limited processing capacity
• Need to rest, easily bored
• Inconsistency across exemplars
• Creative, easily distracted
• Values (ethics and morale)
• Mental illness
How to compensate?
10. critical things
Things or networks of things where
failure could lead to an accident
- Pressure vessels
- Oil & Gas wells
- Boilers
- Industrial Instrumentation & Control
- Emergency shutdown
- Fire and gas leak detection
- Life support devices
- Pacemakers
- Infusion pumps
form critical systems
11. system criticality
Non - Critical
Useful system
- Low dependability
- System does not
need to be trusted
Business - Critical Mission - Critical Safety - Critical
High Availability
- Focus on cost s of
failure caused by
system downtime,
cost of spares, repair
equipment and
personnel and
warranty claims
High Reliability
- Increase the
probability of failure
free system
operation over a
specified time in a
given environment
for a given purpose
High Safety &
Integrity Level
- High reliability
- High availability
- High security
- Focus is not on cost,
but on preserving life
and nature
13. Drill string
Drilling Control System
Weight on
Bit
Rotation
Mud
Circulation
Manual Control
- Interpret data
- Perform tasks
A manually controlled process
drilling
14. • I have to make frequent decisions and many of
them depend upon readings from sensors that
can be correct, noisy, random, unavailable, or
in some other state.
• The decisions I have to make often have safety
consequences, they certainly have economic
consequences, and some are irreversible.
• At any point in time there may be three or four
actions I could take based on my sense of
what’s happening on the rig
• I would like better support to determine how
trustworthy my readings are, what the possible
situations are and the consequences of each
action.
What is the best action
to take?
enhance human decision making
15. systems of action
• Can sense or observe a phenomena, process or machine
• Process observations and search for anomalies, undesired state
changes and other deviations that must be dealt with.
• Plan and execute / (recommend execution of) actions to bring the observed
phenomena, process or machine back to its desired operational state.
• Monitor effects of actions and re-plan if action did not have intended effect
on process state
Computer systems that
making better decisions under stress and uncertainty
16. “Drillers Buddy”
Real-time data
Manual Control
Recommend actions in
context of process state
add active computer support
Drill string
Drilling Control System
Weight on
Bit
Rotation
Mud
Circulation
17. Drillers Buddy
State & Events
Drilling Simulator
• Hydraulic model
• Mechanical model
• Temperature model
Drilling Advisor
• Uncertainty model
• Causality model
• Reasoning
• Planning model
Drilling Control System
Real-Time Data
Actions
technical building blocks
Action to be executed by human, but concept opens up for more computer control in the future.
i.e. Drilling advisor can be turned into “synthetic driller”.
Historical Data
18. What is the best action to take for the business?
What is the best action to take for control or safety?
What is the process state and where is it heading?
What do we know for certain and what are we
estimating?
What are we measuring directly, with what accuracy?
What can we infer about performance and changes
in the physical system?
Local Action
Optimization
Situational
Awareness
Uncertainty and
Validation
Physical System
Behavior
Physical System
Sensing
Global Action
Optimization
IncreasinglyActionableInformation
expressed in capabilities
19. Local Action
Optimization
Situational
Awareness
Uncertainty and
Validation
Physical System
Behavior
Physical System
Sensing
Global Action
Optimization
Machine
learning
(Bayesian)
+
Physics
(Cyb)
Decision
/ game
theory
Automated
planning
and
scheduling
Rational agent
• has goals
• models uncertainty
• chooses action with optimal
expected outcome for itself
• Examples:
− human (on a good day)
− intelligent software agent
more sophisticated technology
Sensors
20. solution creates new challenges
What parts are safety critical?
What parts are only business critical?
How to assess and protect against cyber threats?
How does failure in non-safety part influence safety and security?
What dependencies do we have?
Industry become software dependent
How to design software that tackles mechanical failures?
23. before software
Tangible control logic
• Design level
• Implementation level
• Verification & test level
No cyber threats
• Intrusion
• Viruses
• Theft
• Identity
24. two unique properties
Inspection & Test
• Software can’t be inspected and
tested as analogous components
CPU – the single point of failure
• All signals are threaded through the
one single element.
• Execution sequence is un-known
• Same defect is systemized across
multiple instances
Impacts how we must manage software for critical systems
26. common mode failure
“results from an event which
because of dependencies
causes a coincidence of failure
states of components in two or
more separate channels of a
redundancy system, leading to
the defined systems failing to
perform its intended function”.
Ariane 5 test launch, 1996
27. malware, viruses and hacking
Motivated by financial, political, criminal or idealistic interests
Software created to cause harm
• Change of system behaviour
• Steal / destroy data or machines
Exploits weaknesses in
• Human character
• Technical designs
Horror stories:
• Stuxnet and the Iranian centrifuges (Siemens control system)
• Saudi Aramco hack of 35000 computers (Windows back office)
28. human factors
How to minimize the effects of human error?
Mistakes occur everywhere
• Specification
• Design
• Implementation
• Deployment
• Operations
Humans make mistakes
• By commission
• By omission
• By carelessness
29. blurred boundaries
Conflicting interests, divergent
situational understanding across
disciplines and roles.
Architects thinks and designs in terms of hierarchy and layering
Programmers thinks and designs in terms of threads of execution
Users need systems that works and solves a real world problems
Operations needs to get the job done
32. architecture
Separation and protection of critical functions
Local Action
Optimization
Situational
Awareness
Uncertainty and
Validation
Physical System
Behavior
Physical System
Sensing
Global Action
Optimization
33. standards
IEC 61508 Functional safety of safety instrumented systems for the process industry sector
IEC 61511 Safety instrumented systems for the process industry sector
DO-178C Software considerations in airborne systems and equipment certification
The good thing about standards is that there are so many to choose from
Andrew S. Tanenbaum
Not sufficient on their own
Represents insights
Must be tailored to be useful
36. summary
Things run on software
Critical things form critical / high-integrity systems
Cognitive functions make software inherent complicated
Holistic, architecture centric Systems Engineering
Software is used to offload and support human operators
2nd and 3d order failure effects must be addressed upfront
Forging design thinking with high-integrity systems practices
37. Safety and security in mission critical IoT
systems
Einar Landre
Lead Analyst
E-mail einla@statoil.com
Tel: +4741470537
www.statoil.com
Thank you
Editor's Notes
Macondo:
A difficult well & reservoir
The latest and greatest technology
Human operators did not understand system messages and alarms
Focus on making things work
No trust in the IT systems
50 minutes from first anomaly to blow-out
False-positives is probably one of the most important threats toward humans building trust to technical systems. For a system with a high frequency of false-positive alarms, the real alarms will not be detected. Cancelling out false-positives before they reach the human operator is one of the most vital HSE measures in complex systems.
Historically information technology has been used to implement what we call “systems of record”. These are systems whose primary function is capture and storage of data, it be operational events, engineering decisions or sensor readings.
Today information technology has reached a technological readiness level where it has become cost efficient to create what we have chosen to call “systems of action”. These are systems that can analyse data in context of a process and either recommend or execute the best possible action. These systems enable automation of tasks across all phases of the well construction process.
Thinking of a human driving a car or any other machine the reasoning defined by the five lowest layers takes place all the time. When things get too complicated due to process or mechanical failure, situational awareness is easily lost with the effect that local action optimization collapses and the catastrophe is on its way.
Leading edge information technology enables us to automate at all levels in the stack, but since few of us really like a world with machines doing things on their own, such automation needs to be done on the terms of the human operators. That implies the human is in control and understands what goes on, with respect to the controlled process and in the machine itself.
Planning is searching through possible sequences of action for a path that reaches the goal while respects constraints.
Decision theory is finding the optimal action. Game theory is «interactive decision theory», meaning that other agents will respond to your action.
Rational agents is really a term from economics, but also used in AI and other fields.
For those who have seen Apollo 13, that is an excersize in how to program an analog computer, bringing electronical circuits alive by switches.