Dh Esra 07.0411 English

What do we know about the ICT-
systems on Deepwater Horizon?
ESRA-seminar 07 April, 2011, Stavanger
Jon Espen Skogdalen
Tlf: 99024171
jon.espen.skogdalen@gmail.com

Deepwater Horizon Study Group
– http://ccrm.berkeley.edu/deepwate
rhorizonstudygroup/index.shtml
• Finar Report 01. March:
– http://ccrm.berkeley.edu/pdfs_pape
rs/bea_pdfs/DHSGFinalReport-
March2011-tag.pdf
• Working paper:
– Looking Forward - Reliability of
Safety Critical Control Systems on
Offshore Drilling Vessels
– http://ccrm.berkeley.edu/pdfs_pape
rs/DHSGWorkingPapersFeb16-
2011/Reliability-of-
SafetyCriticalControlSystemsOffshor
eDrillingVessels-JES_OS_DHSG-
Jan2011.pdf

Jon Espen
Skogdalen

Control systems
• Monitoring, recording and logging of plant status and
process parameters;
• Provision of operator information regarding the plant
status and process parameters;
• Provision of operator controls to affect changes to the
plant status;
• Automatic process control and batch/sequence control
during start-up, normal operation, shutdown, and
disturbance. i.e. control within normal operating
limits;
• Detection of onset of hazard and automatic hazard
termination (i.e. control within safe operating limits),
or mitigation;
• Prevention of automatic or manual control actions
which might initiate a hazard.

Source: HSE UK
Jon Espen
http://www.hse.gov.uk/comah/sragtech/techmeasconts
Skogdalen yst.htm

Background
• The drilling industry is characterized by a rapid and up
front technology development to conquer larger ocean
and drilling depths.
• The level of automation on offshore drilling vessels has
been steadily increasing over several decades, growing
from manually operated sledge-hammer technology to
space-age computer-based integrated systems.
• Automation systems are essential for the safety,
reliability, and performance of the vessels:
– Dynamic Positioning (DP) computer systems
– Power Management Systems
– Drilling Control Systems
– BOP Control System
– Ballast Systems
– Fire and Gas systems
– …

Jon Espen
Skogdalen

Characteristics of deepwater
drilling GoM

• Great costs
• Integrated operations (ICT)
• Using up front technology (software based)
• Complex casing programs
• Narrow drilling margins
• High pressure and high temperatures (HPHT)
• Tight sandstone reservoir and fluids with extreme
flow assurance
• Subsea operations
• Problematic formations
• Uncertain seismic
• Lack of experienced personnel

Jon Espen
Skogdalen

potential consequences of failures in the
DP control system are

• Drive-off, where the vessel drives off position by
use of its thrusters and propellers, typically due
to an error in the position reference and sensor
systems, or fail-to-full of a thruster or main
propeller.
• Drift-off, where the vessel drifts off
position/heading due to insufficient available
thrust, typically due to some single failure
combined with errors in specialized software
functions like consequence analysis or thrust
allocation.
• Unnecessary loss of DP class, causing an
abortion of the ongoing drilling operation.

Jon Espen
Skogdalen

potential consequences of failures in the
Power Management System are

• Complete black-out, causing a drift-off and loss
of power to all drilling operations.
• Partial black-out, causing abortion of ongoing
drilling operations and loss of DP class.
• Failure on PMS blackout recovery after a
complete or partial black-out leading to a
sustained blackout and possible loss of the ability
to perform an emergency disconnect (EDC) from
the subsea BOP.
• Loss of position due to incorrect load reduction
of the thrusters and following lack of thrust
capacity.

Jon Espen
Skogdalen

potential consequences of failures in a
Drilling Control System are

• Dropping of Marine Riser segments or
tubulars (pipes) on the drill floor, causing
equipment damage and possibly serious injury to
personnel.
• Collision between two drilling machines,
causing equipment damage and possible serious
injury to personnel.
• Machine malfunction causing stoppage to or
slowing down of the drilling operation and
possible equipment damage.
• Damage to the wellbore, with possibility of
follow-on environmental damage.

Jon Espen
Skogdalen

The Deepwater Horizon
accident

Jon Espen
Skogdalen

Deepwater Horizon accident
• From the Deepwater Horizon Incident Joint
Investigation it has been revealed that:
– The BOP did not close as intended
– General alarms were inhibited, and not understood
– The Emergency Disconnect System did not
disconnect
– The engine control systems did not work as
intended
– The emergency generators did not work as
intended
– ……………….

Jon Espen
Skogdalen

Were the systems working?
• Transocean Chief Electronics Technician :
– “the A-chair is located in the dog house. That is the
main operating point for the driller to control all
drilling functions. It controls everything from mud
pumps to top drive, hydraulics. It controls
everything. For three to four months we've had
problems with this computer simply locking up. I
forget what we -- We even coined a term, the
blue screen of death, because it would just
turn to a blue screen. You would have no data
coming through.”

Examination of MICHAEL K. WILLIAMS, Chief
electronics technician Transocean., FRIDAY,
JULY 23, 2010 The transcript of The Joint
United States Coast Guard/The Bureau of
Ocean Energy
Jon Espen 8 Management, Regulation and Enforcement
Skogdalen

Are failures common?
“they could not get the bugs worked out of the new
operating system. They couldn't get the old
software to run correctly on the new operating
system.”
• “Now, you said there was something called the
blue screen of death. Is that a phrase you used
or was that a phrase of common knowledge
within the crew?”
• “Common knowledge.”

Jon Espen
Ocean Energy
Skogdalen 8 Management, Regulation and Enforcement

• “Okay. And what did the blue screen of death refer to?
• “The complete lack of video to the chair.”
• “So the driller sitting in the chair has got a screen in front of
him. Right?”
• “He has two screens in front of him.”
• “Okay. Fair enough. He's got screens in front of him, and we've
heard previously that the problem was, at least in the A-chair,
the screens would lock up or freeze. Are you familiar with that?”
• “Yes.”
• “Okay. Did that also happen in the B Chair?”
• “Occasionally.”
• “Okay. And when they froze, was that what you were referring
to as the blue screen of death, the driller wasn't getting the
necessary information?”
• “Yes. It would do either/or. Sometimes it would get a
blue screen of death, sometimes it would just lock up and
no data would change.”
Jon Espen Ocean Energy

• “Did you ever complain to anyone about the blue
screen of death?”
• “All the time.”
• “Who did you complain to on board the vessel?”
• “Electrical supervisor.”
• “Okay. Did you ever complain to Mr. Harrell
(OIM)?”
• “He complained to me.”
• “Mr. Harrell complained to you about it?”
• “He wanted them fixed.”
• “Okay. So he wanted you to fix them?”
• “Everybody did.” electronics technician Transocean., FRIDAY,
Jon Espen
Ocean Energy
Skogdalen
8 Management, Regulation and Enforcement

Software causes precursor incidents?
• “Now, you said that -- not on this well, not on
the MACONDO 252 but on a prior well prior to
the DEEPWATER HORIZON arriving on site at the
MACONDO well there had been a problem with
the drilling chairs and that led to a kick.”
• “Do you recall that testimony?”
• “Yes, I do.”


Software – causes precursor
incidents – part 2
• “When the chair went down, it was brought back up, and
there's a software program that runs inside the other program
called a tag replicator. The tag replicator is -- All three chairs
are connected via servers, and in order to get that chair back
fully functioning, the tag replicator must go to the other two
chairs and verify the data it's receiving so that it will display to
the driller the correct values for everything on the screen from
mud pump pressure to how many strokes a minute to all the
different tags. There's several hundred tags that the software is
looking at all the time. Upon the reboot of the chair, getting it
back up, the tag replicator did not function, and the driller was
looking at data that was erroneous.”
• “And as a result of the driller looking at data that was
erroneous after the screen and the computer returned to
its functionality, did a kick happen?”
• “We took a kick in -- During that process a kick was
discovered.”

Ocean Energy
Jon Espen 8 Management, Regulation and Enforcement
Skogdalen

Technology is not understood? –
quick fixes

• “Okay. And the BOP panel being dead, was that in the
driller shack?”
• “Yes, sir.”
• “Okay. So that if the driller was sitting in the driller
shack and he had a well control situation and wanted
to activate the BOP and the panel was dead, he
couldn't do anything about it, is that what you're
telling us?”
• “Not at that time he couldn't.”
• ………
• “Is that a good maintenance practice to use a bypass
when the panel is dead rather than fixing it?”
• “Not in my opinion.” Examination of MICHAEL K. WILLIAMS, Chief

Not managed by the safety management
system?
• “When I started in the ET shop officially in April 2009,
the fire and gas system was a wreck. There were
several detectors that were faulted, overridden, and
completely ignored out of the system due to lack of
maintenance. I took it upon myself, and my assistant,
Stenson Roark, to rectify that, and we got the fire and
gas system back up to snuff, and I made it a point
every hitch, when I got out there the first day, the
first thing I did was go to the SIMRAD station and go
to the fire and gas page and see how many detectors
were inhibited, how many sensors were inhibited, how
many were overridden, how many were faulted,
because that was my primary concern when I got
to the rig is my own safety.”
• “Throughout that or prior -- During that time
period, there was no tracking of the fire and gas
system, to my knowledge.” Examination of MICHAEL K. WILLIAMS, Chief

Systems are run in a way they where not
intended?

“Thank you, sir. So if the Sperry flowout sensor was
being bypassed there would be no way for the
mud logger to monitor the returns, would there?
Would you agree with that?”
• “If the sensor was bypassed? No, there wouldn't
be a way. For the mud logger to monitor?”
• “Right.”

Examination of Stephen Ray Bertone, Chief
engineer, Monday, July 19, 2010 The
transcript of The Joint United States Coast
Jon Espen Guard/The Bureau of Ocean Energy

Errors are known – but, when it comes to
software it is not known how to fix them?

• “Okay. Were the audits in part to help
Transocean identify maintenance or equipment
issues that needed attending or fixing?”
• “We didn't need them identified. We knew
what they were.”

Examination of Stephen Ray Bertone, Chief
engineer, Monday, July 19, 2010 The
transcript of The Joint United States Coast
Jon Espen Guard/The Bureau of Ocean Energy

Do we understand the errors?
• “You're saying that the explosion -- what you're
thinking is, the explosion did something to the
logic in the control system so that it was giving
you all kinds of weird signals?”
• “Yeah. I would think so.”

Examination of Jimmy Wayne Harrel, OIM.,
Thursday, May 27, 2010, 2010 The transcript
of The Joint United States Coast Guard/The
Jon Espen Bureau of Ocean Energy

Adequate testing?
• About the ESD and ESD panel:
– “We never tested the automatic feature, to my
knowledge. I never tested the automatic function
of it. We did not go introduce gas somewhere to
see what it would do. It was just understood that it
would work.”


Findings – Close interaction
“It is important to realize that there are very few
limits to how software may be designed. An
apparently small fix to one part of the software
may cause unexpected behavior in another part
of the software, potentially causing a complete
failure to comply with the designed system
functionality.”

Skogdalen, J.E. and Ø.N. Smogeli, White Paper
Looking forward - Reliability of safety critical
Jon Espen control systems on offshore drilling vessels.,
Skogdalen DHSG, Editor. 2010 p. 18.

No fail safe
“Often, technological systems are made to be so
called “fail safe”. Fail safe describes a device or
feature which, in the event of failure, responds in
a way that will cause no harm, or at least a
minimum of harm, to other devices or danger to
personnel. This “fail safe” terminology is often
misapplied and misused, and for most of the
safety critical systems there are no truly “fail
safe” conditions. Either the system works as
intended and maintains safety, or it does not and
may cause or fail to prevent an incident or
accident.”


Common cause failures
“Safety critical systems are usually engineered
according to the principles of barriers and
independent systems to ensure redundancy. In a
control system, many of these barriers will exist
only in software. Failures in software can
therefore act as common cause failures, and
significantly reduce the reliability of the system.“


Finding – precursor incidents are often
not reported (?)

“Malfunction software may be totally hidden to the
user until it fails, but in several do the user get
precursor incidents in form of e.g. “blue screens”
and not responding systems. The precursor
incidents may be just for a short time (1-3
seconds), and it is the author`s experience and
view that many of these precursor incidents do
not get reported due to the fact that the systems
do work again and that the user do not
understand what happened. We use with
intention the word “precursor incident” due to
this incidents might be warnings about serious
failures in the software.”

Designing the systems
“We are designing systems with potential
interactions among the components that cannot
be thoroughly planned, understood, anticipated,
or guarded against. The operations of some
systems are so complex that it defies the
understanding of all but a few experts, and
sometimes even they have incomplete
information about its potential behavior.”

Leveson N. A new accident model for
engineering safer systems. Safety Science.
2004;42:237-70.
Jon Espen
Skogdalen

The Commissions report and
Chief Counsel’s Report

Jon Espen
Skogdalen

Chief Counsels’ report: Displays,
sensors and instrumentation

The seven ultra-deepwater semis are a $3 billion-plus commitment. Four have
been delivered, and three are under construction. In the last five years we’ve
built four jackups and spent $550 million enhancing our existing fleet. With the
semis, all seven are the same design and being built with Keppel FELS,
probably the best shipyard in the world. Our rigs have been on time and on
budget. A lot of the equipment is software-driven, and that was probably the
biggest challenge. I think that’s what most people are finding with these new
rigs – getting the bugs out of the software is the biggest issue.

Summary

Jon Espen
Skogdalen

Looking forward
• Incidents related to software bugs must be
reported:
– Training must be given to operators (what can be
expected by the system?).
– Training in “bug-reporting”.
• Data related to malfunctioning software must be
collected across installations and companies.
• Safety indicators related to the status of safety
critical systems must be worked out.

Jon Espen
Skogdalen

Looking forward
• Independent verification and validation of safety-
critical control system software/hardware:
– Class standards related to verification of safety
critical systems (software/hardware) should be
introduced/common practice (DNV, ABS…..).
e.g. DNV Enhanced System Verification.
– Hardware-In-the-Loop (HIL) testing
• Procedures related to go/stop/start-rules for
malfunctioning software/hardware in safety
critical systems must be worked out
• Safety audits focusing on safety critical systems
depending on software/hardware

Jon Espen
Skogdalen

Looking forward
• The requirements related to safety critical
systems should be in accordance with the safety
barrier principles and requirements at Norwegian
and UK Shelf.

Jon Espen
Skogdalen

Dh Esra 07.0411 English

Recommended

Recommended

More Related Content

Similar to Dh Esra 07.0411 English

Similar to Dh Esra 07.0411 English (20)

Dh Esra 07.0411 English