Ep 37 2003 NE Blackout
Engineering News – LEGO is Running On 100% Renewable Energy (2:10)
This week's engineering failure is the 2003 Northeast Blackout (8:10). What started as a series of small events over the course of a couple of hours (11:00), cascaded into a full-blown blackout on a hot summer day. The impact was huge (17:10) and there were a number of causes (27:30). The investigation (32:00) shared their findings and recommendations to prevent this from happening again.
2003 NE Blackout
Hi and welcome to Failurology; a podcast about engineering failures. I’m your host, Nicole
And I’m Brian. And we’re both from Calgary, AB.
Thank you again to our Patreon subscribers! www.patreon.com/failurology
This week in engineering news, LEGO is running off of 100% renewable energy.
4 years and a 900 million USD investment in two offshore wind farms
The total power generated from the renewable sources exceeds the energy consumed at all LEGO factories, stores and offices globally
360-gigawatt hours of energy used to produce 75 million LEGO bricks
20,000 solar panels were installed on the roof of the LEGO factory in Jiaxing China producing 6 gigawatts of energy per year
Installed 3,500 solar panels on their Czech Republic factory - reducing CO2 emissions by 500 tonnes annually.
Installed an innovative new cooling system at their Danish factory to use outside air to cool the production process of moulding LEGO bricks. This is instead of a refrigerant-based system, providing an energy reduction of 538,000 kWh which is equivalent to a savings of 111 tonnes of CO2 emissions
Changed the entire lighting system in the LEGO factory in Mexico to 19,000 high-efficiency LED bulbs across the production floor - reducing 1,300 tonnes of CO2 emissions annually.
LEGO has ownership of the Borkum Riffgrund 1 offshore wind farm in Germany and the Burbo Bank Extension offshore wind farm in Liverpool UK for a total power generation of 570 megawatts
LEGO is working with its supply chain to reduce carbon emissions, water use and forestry impacts
LEGO campus in Billund, Denmark which opened in 2019, used strong plasterboard to save 22,000 kg of steel and 353,000 kg of CO2 emissions. Over 4,000 solar panels were installed on the roof producing 1 million kWh of energy annually.
Now on to this week’s engineering failure; the 2003 NE Blackout.
August 14, 2003 beginning at 4:10 pm eastern
Impacted Ontario, Ohio, Michigan, Pennsylvania, New York, Vermont, Massachusetts, Connecticut and New Jersey - 55 million people were impacted – 508 generating units at 265 power plants shut down during the outage
Parts of Ontario suffered rolling blackouts for more than a week before power was restored
The cost was between 4-10 billion USD - a net loss of 18.9 million work hours,
At the time, the world’s second most widespread blackout in history after the 1999 Southern Brazil blackout
Sequence of events
The following is the blackout's sequence of events starts on August 14, 2003
12:15 p.m. A power flow monitoring tool goes down, an operator corrects the telemetry problem but forgets to restart the monitoring tool. Incorrect telemetry data renders inoperative the state estimator, a power flow monitoring tool operated by the Indiana-based Midwest Independent Transmission System Operator (MISO).
1:31 p.m. The Eastlake, Ohio generating plant shuts down. The plant is owned by FirstEnergy, an Akron, Ohio-based company.
2:02 p.m. The first of several 345 kV overhead transmission lines in northeast Ohio fails due to contact with a tree in Walton Hills, Ohio. 41°21′22″N 81°34′10″W[
2:14 p.m. An alarm system fails at FirstEnergy's control room and is not repaired.
3:05 p.m. A 345 kV transmission line known as the Chamberlin-Harding line sags into a tree and trips in Parma, south of Cleveland.
3:17 p.m. Voltage dips temporarily on the Ohio portion of the grid. Controllers take no action.
3:32 p.m. Power shifted by the first failure onto another 345 kV power line, the Hanna-Juniper interconnection, causes it to sag into a tree, bringing it offline as well. While MISO and FirstEnergy controllers concentrate on understanding the failures, they fail to inform system controllers in nearby states.
3:39 p.m. A FirstEnergy 138 kV line trips in northern Ohio.
3:41 p.m. A circuit breaker connecting FirstEnergy's grid with that of American Electric Power is tripped as a 345 kV power line (Star-South Canton interconnection) and fifteen 138 kV lines fail in rapid succession in northern Ohio.
3:46 p.m. A fifth 345 kV line, the Tidd-Canton Central line, trips offline.
4:05:57 p.m. The Sammis-Star 345 kV line trips due to under-voltage and over-current are interpreted as a short circuit. (Later analysis suggests that the blackout could have been averted before this failure by cutting 1.5 GW of load in the Cleveland–Akron area.)
4:06–4:08 p.m. A sustained power surge north toward Cleveland overloads three 138 kV lines.
4:09:02 p.m. Voltage sags deeply as Ohio draws 2 GW of power from Michigan, creating simultaneous under voltage and over current conditions as power attempts to flow in such a way as to rebalance the system's voltage.
4:10:34 p.m. Many transmission lines trip out, first in Michigan and then in Ohio, blocking the eastward flow of power around the south shore of Lake Erie from Toledo, Ohio, east through Erie, Pennsylvania, and into southern Erie county, but not most of the Buffalo, New York, metropolitan area. Suddenly bereft of demand, generating stations go offline, creating a huge power deficit. In seconds, power surges in from the east, overloading east-coast power plants whose generators go offline as a protective measure, and the blackout is on.
4:10:37 p.m. The eastern and western Michigan power grids disconnect from each other. Two 345 kV lines in Michigan trips. A line that runs from Grand Ledge to Ann Arbor known as the Oneida-Majestic interconnection trips. A short time later, a line running from Bay City south to Flint in Consumers Energy's system known as the Hampton-Thetford line also trips.
4:10:38 p.m. Cleveland separates from the Pennsylvania grid.
4:10:39 p.m. 3.7 GW power flows from the east along the north shore of Lake Erie, through Ontario to southern Michigan and northern Ohio, a flow more than ten times greater than the condition 30 seconds earlier, causing a voltage drop across the system.
4:10:40 p.m. Flow flips to 2 GW eastward from Michigan through Ontario (a net reversal of 5.7 GW of power), then reverses back westward again within a half-second.
4:10:43 p.m. International connections between the United States and Canada start to fail.
4:10:45 p.m. Northwestern Ontario separates from the east when the Wawa-Marathon 230 kV line north of Lake Superior disconnects. The first Ontario power plants go offline in response to the unstable voltage and current demand on the system.
4:10:46 p.m. New York separates from the New England grid.
4:10:50 p.m. Ontario separates from the western New York grid.
4:11:57 p.m. The Keith-Waterman, Bunce Creek-Scott 230 kV lines and St. Clair–Lambton #1 230 kV line and #2 345 kV line between Michigan and Ontario fail.
4:12:03 p.m. Windsor, Ontario, and surrounding areas drop off the grid.
4:12:58 p.m. Northern New Jersey separates its power-grids from New York and the Philadelphia area, causing a cascade of failing secondary generator plants along the New Jersey coast and throughout the inland regions west.
4:13 p.m. End of cascading failure. 256 power plants are off-line, 85% of which went offline after the grid separations occurred, most due to the action of automatic protective controls.
Essential services remained in operation
Some backup generation systems failed
Telephone networks generally remained operational; increased demand left many circuits overloaded
Water systems in several cities lost pressure, leading to boil water advisories
If you haven’t checked out our Flint episode, it’s number 12.
The accidental release of 140 kg of vinyl chloride from a Sarnia, ON chemical plant was revealed 5 days later
In New York, Newark, New Jersey and Kingston, amongst other places, had sewage spilled in waterways, requiring beaches to be closed
Several gas stations were unable to pump fuel, leaving cars and transport trucks stranded
Cell service was interrupted due to the increased volume of calls leading to overloading.
Some TV and radio stations stayed on air, with the help of generators
Stars and orbiting satellites became visible to the naked eye in metropolitan areas bc of the lack of light pollution
Amtrak, which relies on electricity for signalling and crossing systems, shut down. Via Rail continued to operate. All airports in the blackout areas shut down and flights were diverted to airports with power
New York City subway resumed limited service around 8 pm
I grew up outside of Windsor Ontario, it was a while ago, but if memory serves, we lost power for the entire next day and it came back some time on the 16th. I was also grounded at that time so it was an extra boring summer.
To prevent damage from an overload event, nuclear power plants went offline until they could be slowly taken out of safe mode. Bruce Nuclear Generating plant was able to throttle back their output without a complete shutdown, then reconnect to the grid 5 hours later
Available hydro-electric, coal and oil-fired plants came online and provided power to the areas immediately surrounding the plants
Some other pockets were able to avoid power outages by disconnecting from the larger grid
There was up to a 7-hour wait at one point for trucks crossing the Ambassador Bridge between Detroit and Windsor. Following 911 the wait got to be 3-4 hours. Nowadays it’s usually 30-60 min to cross if you time it right.
Street lights were out leading to a lot of traffic back up, in some cases members of the public started directing traffic until they were relieved by police
About 140 miners were marooned underground in the Falconbridge mine in Sudbury when the power went out. A refinery scrubber at another site in Sarnia lost power and release above normal levels of pollution
There were also a lot of 911 calls for fire, mostly from candles. In NYC specifically, they had more than double the average calls for help
There were 12 deaths report. Some from carbon monoxide poisoning, some were hit by cars
It was 31C on the day of the blackout, leading to increased energy demand as people turned on fans and air conditioning, causing power lines to sag as higher currents heated the lines
The proximate cause (the one the law recognizes as the official cause) was a software bug in the alarm system at the control room of FirstEnergy in Akron Ohio
But human error, equipment failures and software issues were all to blame
The software bug rendered operators were unaware of the need to redistribute the load after overloaded transmission lines dropped into the foliage
This should have been manageable but turned into the collapse of the NE regional distribution
A 3,500-megawatt power surge affected the transmission grid at 4:10 pm eastern
Right before the blackout, the system was carrying 28,700 megawatts of load, at the height of the outage the load dropped to 5,716 megawatts, a loss of 80%
Because overloading can cause costly damage, an affected device is disconnected from the network if an overload is detected. Once detected, automatic protective relays disconnect the line and transfer the load to other lines. If the other lines don’t have enough spare capacity to accommodate the extra current, they also overload, causing a cascading failure.
After a failure affecting a grid system occurs, operators must “shed load” or obtain more power from generators until they can rule out a system collapse will happen. In an emergency scenario, they are supposed to immediately shed load to bring the system into balance
To help operators detect overloading, there are computer systems that issue alarms when there are faults in the system
Power flow modelling tools let operators analyze the state of their network, predict overloading and predict failure points; allowing them to change the distribution of generation and reconfigure the system to prevent a failure
If these power systems and backups fail, the operators have to monitor the grid manually. If they can’t interpret the state of the grid, they follow a contingency plan, contacting other plant and grid operators by phone if necessary. They also have to notify nearby areas which may be affected, so they can predict effects on their systems
A joint federal task force was formed by the governments in Canada and the US-led by Canadian Natural Resource Minister Herb Dhaliwal and US Energy Secretary Spencer Abraham
Determine the initial cause and examination of the failure of safeguards designed to prevent failure
US did not seek federal-level punishment towards FirstEnergy Corp because US law didn’t require electric reliability standards at the time. This changed when the 2005 revisions to the act came into play; although I am not sure how robust they are. It’s hard to know how many catastrophic failures were prevented because we only really hear about the ones that happen, not the ones that almost were.
FirstEnergy and its reliability council “failed to assess and understand the inadequacies of FE’s system, particularly with respect to voltage instability and the vulnerability of the Cleveland-Akron area, and FE did not operate with appropriate voltage criteria”
FirstEnergy “did not recognize or understand the deteriorating conditions of its system”
FirstEnergy “failed to manage adequately tree growth in its transmission rights-of-ways” – the interesting thing I read – there were very clear cut (pun intended) rules for clearances to several obstructions, but clearances to vegetation carried by state or province and by individual utility
Finally, the “failure of the interconnected grid’s reliability organizations to provide effective real-time diagnostic support”
A generating plant in Eastlake, Ohio (a suburb of Cleveland) went offline during high electrical demand, putting a strain on high voltage power lines, which later went out of service when they came in contact with “overgrown trees”. Those lines transferred their load to other lines, which couldn’t handle the load, tripping their breakers. With multiple trips occurring, several generators lost parts of their loads, accelerated out of phase with the grid at different rates and tripped out to prevent damage. This led to a forced shut down of 265 power plants
A software bug known as a race condition existed in the energy management system. This bug stalled alarms for over an hour. The system operators were unaware of the malfunction.
Unprocessed alarms queued up until the primary server failed within 30 minutes. The backup server failed at 14:54. By 15:42 the control room itself lost power and operators informed technical support who were already troubleshooting the issue.
Operators have adopted different interpretations of the functions, responsibilities, authorities and capabilities needed to operate a reliable system. There was also no effective protocol for sharing information within the control room or with others outside the control room.
Models used to view and forecast loads were inaccurate due to a lack of verifications through benchmarking with actual data and field testing. They also weren’t peer-reviewed or shared amongst operators
There was no effective load reduction plan or adequate load reduction capability to relieve the overloaded lines
The items such as poor vegetation management, operator training practices and lack of adequate tools allowing operators to visualize system conditions were also factors in prior large-scale blackouts. History always repeats itself
Long Term Effects
The US included reliability provisions in the Energy Policy Act of 2005; apparently, Texas missed that memo, more on that in a future episode.
Not only was there an infrastructure problem, but they also deemed it a homeland security problem as most of the systems that were used to detect unauthorized border crossings or port landings failed without power.
This became a big political issue for Ontario, whose conservative Premier Ernie Eves did not want to expand the province’s power generating capabilities; therefore relying on the US for power. As well he took longer than other Mayors and Governors to speak publicly about the event.
Recommendations to Prevent Another Blackout
There are 46 recommendations, we’re not going to read them all, just some that we thought were important and/or interesting
Reliability standards are to be mandatory and enforceable, with penalties for noncompliance.
These standards should be developed by a third party with fair stakeholder representation in the selection of the directors and committee
Allow investments in bulk system reliability to be recoverable through transmission rates
Protect operators who shed loads as per approved guidelines from liability or retaliation
Standardize vegetation clearances and enforce them
Improve research for both reliability, monitoring, and modelling methods and technology
Establish clear authority for physical and cyber security
Install back a generation at nuclear power plants
So there you have it, the 2003 NE blackout. A really hot day with lots of air conditioning units running, some overgrown trees and a software bug knocked out power to a large portion of the NE US and Ontario.
For photos, sources and an episode summary from this week’s episode head to Failurology.ca. If you’re enjoying what you’re hearing, please rate, review and subscribe to Failurology, so more people can find it. If you want to chat with us, our Twitter handle is @failurology, you can email us email@example.com, or you can connect with us on Linked In. Check out the show notes for links to all of these. Thanks everyone for listening. And tune in to the next episode where we will tell you all about the Charles de Gaulle airport collapse in Paris France. Bye everyone, talk soon!