Saturday, December 10, 2011

Disaster book club: What you need to read to understand the crash of Air France 447:



Right now, I'm reading a book about why catastrophic technological failures happen and what, if anything, we can actually do about them. It's called Normal Accidents by Charles Perrow, a Yale sociologist.



I've not finished this book yet, but I've gotten far enough into it that I think I get Perrow's basic thesis. (People with more Perrow-reading experience, feel free to correct me, here.) Essentially, it's this: When there is inherent risk in using a technology, we try to build systems that take into account obvious, single-point failures and prevent them. The more single-point failures we try to prevent through system design, however, the more complex the systems become. Eventually, you have a system where the interactions between different fail-safes can, ironically, cause bigger failures that are harder to predict, and harder to spot as they're happening. Because of this, we have to make our decisions about technology from the position that we can never, truly, make technology risk-free.



I couldn't help think of Charles Perrow this morning, while reading Popular Mechanics' gripping account of what really happened on Air France 447, the jetliner that plunged into the Atlantic Ocean in the summer of 2009.



As writer Jeff Wise works his way through the transcript of the doomed plane's cockpit voice recorder, what we see, on the surface, looks like human error. Dumb pilots. But there's more going on than that. That's one of the other things I'm picking up from Perrow. What we call human error is often a mixture of simple mistakes, and the confusion inherent in working with complex systems.





Let me excerpt a couple of key parts of the Popular Mechanics piece. You really need to read the full thing, though. Be prepared to feel tense. This story will get your heart rate up, even though (and possibly because) you know the conclusion.



We now understand that, indeed, AF447 passed into clouds associated with a large system of thunderstorms, its speed sensors became iced over, and the autopilot disengaged. In the ensuing confusion, the pilots lost control of the airplane because they reacted incorrectly to the loss of instrumentation and then seemed unable to comprehend the nature of the problems they had caused. Neither weather nor malfunction doomed AF447, nor a complex chain of error, but a simple but persistent mistake on the part of one of the pilots.



Human judgments, of course, are never made in a vacuum. Pilots are part of a complex system that can either increase or reduce the probability that they will make a mistake. After this accident, the million-dollar question is whether training, instrumentation, and cockpit procedures can be modified all around the world so that no one will ever make this mistake again—or whether the inclusion of the human element will always entail the possibility of a catastrophic outcome. After all, the men who crashed AF447 were three highly trained pilots flying for one of the most prestigious fleets in the world. If they could fly a perfectly good plane into the ocean, then what airline could plausibly say, "Our pilots would never do that"?




One of the pilots seems to have kept the nose of the plane up throughout the growing disaster, making this choice over and over, even though it was the worst possible thing he could have done. At the same time, everyone in the cockpit seems to have completely ignored an alarm system that was, explicitly, telling them that the plane was stalling.



Why would they do that? As Wise points out, this is the kind of mistake highly trained pilots shouldn't make. But they did it. And they seem to have done it because of what they knew, and thought they knew, about the plane's complex safety systems. Take that stall alarm, for instance. Turns out, there's a surprisingly logical reason why someone might ignore that alarm.



Still, the pilots continue to ignore it, and the reason may be that they believe it is impossible for them to stall the airplane. It's not an entirely unreasonable idea: The Airbus is a fly-by-wire plane; the control inputs are not fed directly to the control surfaces, but to a computer, which then in turn commands actuators that move the ailerons, rudder, elevator, and flaps. The vast majority of the time, the computer operates within what's known as normal law, which means that the computer will not enact any control movements that would cause the plane to leave its flight envelope. "You can't stall the airplane in normal law," says Godfrey Camilleri, a flight instructor who teaches Airbus 330 systems to US Airways pilots.



But once the computer lost its airspeed data, it disconnected the autopilot and switched from normal law to "alternate law," a regime with far fewer restrictions on what a pilot can do. "Once you're in alternate law, you can stall the airplane," Camilleri says. It's quite possible that Bonin had never flown an airplane in alternate law, or understood its lack of restrictions. According to Camilleri, not one of US Airway's 17 Airbus 330s has ever been in alternate law. Therefore, Bonin may have assumed that the stall warning was spurious because he didn't realize that the plane could remove its own restrictions against stalling and, indeed, had done so.



That, I think, is where Charles Perrow and Air France 447 cross paths. It follows closely with a concept that Perrow calls "incomprehensibility." Basically, the people involved in an accident like this often can't figure out fast enough what is happening. That's because, in high-stress situations, the brain reverts to well-trod models that help you understand your world. You think about the stuff you've practiced 1000 times. You think about what you've been told will happen, if x happens.



But what happens if what's actually going on doesn't mesh with your training? Then the brain finds ways to make it mesh. Those rational explanations might make a whole lot of sense to you, in the moment. But they will lead you to make mistakes that exacerbate an already growing problem.



This is not comforting stuff.



Perrow doesn't tell us that we can figure out how to design a system that never becomes incomprehensible. There is no happy ending. We can design better systems, systems that take the way the brain works into account. We can make systems safer, to a point. But we cannot make a safe system. There is no such thing as a plane that will never crash. There is no such thing as a pilot who will always know the right thing to do.



Instead, Perrow's book is more about how we make decisions regarding risky technologies. Which high-risk technologies are we comfortable using and in what contexts? How do we decide whether the benefit outweighs the risk?

We must have these conversations. We cannot have these conversations if we're clinging to the position that anything less than 100% safety is unacceptable. We cannot have these conversations if we're clinging to the position that good governance and good engineering can create a risk-free world, where accidents only happen to idiots.



I used to believe both those myths. I want to believe them still. Increasingly, I can't. Looking at technological safety in terms of absolutes is child's view of the world. What Perrow is really saying is that it's time for us to grow up.





Images:

• Landing gear of Air France 447, Investigation and Analysis Bureau.

• Memorial to victims of Air France 447 in Rio de Janeiro, Brazil, REUTERS/Ana Carolina Fernandes.





No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...