Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Blackout Task Force Highlights SCADA System Woes

210
views

Published on

The interim report on the US Blackout of 2003 …

The interim report on the US Blackout of 2003
points to failures of SCADA systems and critical
software applications as the chief culprits. Companies
who operate SCADA systems and advanced
online applications should note carefully the role of
these systems in the Blackout events.

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
210
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. THOUGHT LEADERS FOR MANUFACTURING & SUPPLY CHAIN ARC INSIGHTS By Harry Forbes The Joint US-Canadian Task Force named 3 major causes of the US Blackout. These were “situational unawareness”, poor tree trimming, and lack of effective diagnostic support for critical operations. INSIGHT# 2003-51MP DECEMBER 10, 2003 Blackout Task Force Highlights SCADA System Woes Keywords Blackout, Critical Condition Monitoring (CCM), SCADA Summary The interim report on the US Blackout of 2003 points to failures of SCADA systems and critical software applications as the chief culprits. Com- panies who operate SCADA systems and advanced online applications should note carefully the role of these systems in the Blackout events. Analysis The joint US-Canadian Task Force charged with investigating the causes of the 2003 US Blackout issued its interim report in November, 2003. The re- port adds much to the publicly available information concerning the events of August 14, 2003. Press reports have emphasized that the Task Force blamed Ohio utility FirstEnergy (FE) for the outage. A careful reading of the document, however, shows that the blame is fixed both on FirstEnergy and on the Midwest Independent System Operator (MISO). The new information shows that personnel in both organizations 0 10 20 30 01530456075 Minutes Before 4:11PM EDT on August 14 NumberofAbnormalEvents FirstEnergy SCADA Alarms inoperable MISO State Estimator inoperable
  • 2. ARC Insights, Page 2 ©2003 • ARC • 3 Allied Drive • Dedham, MA 02026 USA • 781-471-1000 • ARCweb.com were effectively “flying blind” during the critical 1-2 hours before the out- age began to cascade. Their key tools for detecting and managing abnormal operation had failed, and for most of the time they were unaware of that fact. Background Electric system operators use a “State Estimator” (SE) application to gauge the current state of a power system. This is essentially a data reconciliation application that contains a math model of the power system. It estimates a single consistent set of process data based on the various real-time meas- urements available. The output of the SE is used by a Real Time Contingency Analysis (RCTA) application, which calculates the effect of additional faults on the power system. NERC reliability regulations require that a power system be able to withstand any single contingency without moving beyond equipment operating limits. State Estimator Maladies The report concludes: “The MISO state estimator and real time contingency analysis tools were effectively out of service between 12:15 EDT and 16:04 EDT. This pre- vented MISO from promptly performing pre-contingency “early warning” assessments of power system reliability over the afternoon of August 14.” The reason for this loss of function was that some data inputs to the SE were manual rather than automatic. Twice during the day, with the open- ing of a two different transmission lines, the SE failed to converge to an acceptable solution. The first occurrence was fixed by a technician at 13:00 EDT, who forgot to reschedule the state estimator to run every 5 minutes. By the time this second error was discovered, another transmission line had opened, so that the SE again failed to converge. Then the second abnormal line status was manually entered. The application was restored fully only at 16:04 EDT. By that time the situation was nearly hopeless. The cascade was less than 6 minutes away. The report states that “MISO considers its SE and RCTA tools to be still under development and not fully mature.” That is a memorable example of understatement. Another interesting note from the report tells the status of FirstEnergy’s own RTCA software:
  • 3. ARC Insights, Page 3 ©2003 • ARC • 3 Allied Drive • Dedham, MA 02026 USA • 781-471-1000 • ARCweb.com “FirstEnergy (FE) had and ran a state estimator every 30 minutes...FE in- dicated that it has experienced problems with the automatic contingency analysis operation since the system was installed in 1995. As a result, FE operators or engineers ran contingency analysis manually rather than automatically and were expected to do so when there were questions about the state of the system. Investigation team interviews of FE personnel in- dicate that the contingency analysis model was likely running but not consulted at any point in the afternoon of August 14. “ Loss of SCADA Alarms FirstEnergy’s SCADA system operators were not alerted concerning events in the two hours previous to the blackout because their system stopped processing alarm messages at 14:14 EDT. A couple of quotes from the re- port bring out the key points: “FE’s computer SCADA alarm and logging software failed shortly after 14:14 EDT (the last time a valid alarm came in). After that time, the FE control room consoles did not receive any further alarms nor were there any alarms being printed or posted on the EMS’s [Energy Management System] alarm logging facilities.” “At 14:41 EDT the primary server hosting the EMS alarm processing ap- plication failed…Following preprogrammed instructions, the alarm system application and all other EMS software running on the first server auto- matically transferred onto the backup server. However, because the alarm application moved intact onto the backup while still stalled and ineffective, the backup server failed 13 minutes later, at 14:54 EDT. Accordingly, all of the EMS applications on these two servers stopped running.” The report also notes that the SCADA system was not running the latest version of software, and was slated for replacement by FE. The alarm func- tion was not restored until FE performed a complete “cold” reboot of the SCADA system the following day. During the crisis period, the option of a cold reboot was considered but rejected due to the need for SCADA infor- mation during such a perilous operating period. It is interesting to note that in this case a “hot standby” configuration did not add reliability be- cause a faulted software application was transferred onto it.
  • 4. ARC Insights, Page 4 ©2003 • ARC • 3 Allied Drive • Dedham, MA 02026 USA • 781-471-1000 • ARCweb.com Model Mismatch A third cause highlighted by the Task Force was excessive tree growth un- der transmission lines in FE’s right-of-ways. This caused the loss of three 345kV lines during the sequence, as heavily loaded transmission lines sagged and came into contact with trees. While high branches are certainly the root cause, the effect of this neglected maintenance was that power sys- tem operators were working with overly optimistic models of these lines. The faults occurred “at conditions well within specified operating parame- ters.” So at the time of the Blackout these transmission lines had far less actual capacity than was credited to them in the online models which were used to calculate grid reliability. Recommendations • Carefully consider the level of maintenance given to all your SCADA software, including vendor upgrades. Do so even for systems that are “legacy” or expected to be replaced. • Evaluate your high availability hardware system configurations for vulnerability to common software faults. • Process models which are used in everyday decision-making need to be validated on an ongoing basis. • Evaluate the effectiveness of your key online applications. How often are they actually being used? Are they providing useful assistance to your normal and abnormal operations? Please help us improve our deliverables to you – take our survey linked to this transmittal e-mail or at www.arcweb.com/myarc in the Client Area. For further information, contact your account manager or the author at HForbes@arcweb.com. Recommended circulation: All MAS-P clients. ARC Insights are published and copyrighted by ARC Advisory Group. The information is proprietary to ARC and no part of it may be reproduced without prior permission from ARC.

×