« Things I Have Learned About Foam From the Columbia Accident Investigation Board Report | Three Men and a Container Full of Bath Toys » |
Physics 2, Business Administration 0
"When a program agrees to spend less money or accelerate a schedule beyond what the engineers and program managers think is reasonable, a small amount of overall risk is added. These little pieces of risk add up until managers are no longer aware of the total program risk, and are, in fact, gambling. Columbia Accident Investigation Report, pp 139 One of the most sobering conclusions of the Shuttle accident report is that the Columbia was an exact replay of the Challenger - the same false confidence, the same scheduling and funding pressure, the same lack of attention to an intermittent problem whose causes were never understood. There's even the same badly-designed briefing slide, failing to convey the urgency the engineering team feels, and the same old Edward Tufte on hand to point it out, once the investigation gets into full swing. NASA has no excuses this time. Not only is the organizational behavior identical to Challenger (the normalization of deviance, and an assumption that things are safe unless proven otherwise), but even the mechanics of the accident were familiar territory. It turns out the doomed STS-107 mission had a Döppelganger. In 1988, the Space Shuttle Atlantis was hit by debris dislodged from the top of a solid rocket booster, at almost exactly the same time in the launch sequence as Columbia. The debris crashed against the right side of the Orbiter's belly, gouging out over seven hundred gashes, three hundred of them over an inch long. The impact dislodged one tile completely. By sheer luck, the tile that fell off happened to sit on top of an unusually thick aluminum plate, part of an antenna housing. Atlantis made it back safely, suffering structural damage, but with its hull intact. The difference in response between the 1988 and 2003 incidents tells you everything you need to know about NASA:
After the discovery of the debris strike on Flight Day Two of STS-27R [Atlantis], the crew was immediately directed to inspect the vehicle. More severe thermal damage - perhaps even a burn-through - may have occurred were it not for the aluminum plate at the site of the tile loss. Fourteen years later, when a debris strike was discovered on Flight Day Two of STS-107 [Columbia], Shuttle Program management declined to have the crew inspect the Orbiter for damage, declined to request on-orbit imaging, and ultimately discounted the possibility of a burn-through.The reasons for this maddening complacency will be familiar to anyone who has worked in an organization where the suits face off against the geeks. Every computer progammer learns early on that it's counterproductive to show a software demo to managers - if the demo fails, the managers will be displeased, and if the demo is wonderful, they'll probably decide to ship it as-is. "Looks great - let's go with it!". Just try to explain that there's no error handling, or an intermittent bug in the code, or that clicking the 'help' button crashes the GUI. Engineers are trained to ask "what could possibly go wrong?". Managers are, too, but they use the phrase with a completely different intonation. The debris strike on Atlantis was unprecedented, and came hard on the heels of the Challenger explosion. It was easy for managers to see eye-to-eye with the engineers, and recognize the gravity of the situation. But fourteen years and dozens of successful launches later, the memory of Challenger had receded. In its place was a hugely overambitious launch schedule, and all kinds of political pressure to get the International Space Station 'core complete' by an arbitrary 2004 deadline. And there was also the experience of Atlantis and other debris strikes, which by perverse MBA logic had become arguments for the harmlessness of foam impacts. It had happened so many times before, why start worrying about it now? In an organization like NASA, there have to be safeguards to make sure managers can't overrule engineering decisions, or people will die. The board report cites the US Navy's Naval Reactors program and the Air Force's Aerospace program as good role models in this respect - both of have managed to take the teeth out of hideously risky operations (shipboard nuclear reactors and military satellite launches, respectively) by rigorously separating oversight from management. They've done this by turning their organization into a kind of geek sandwich. There is a lower level of engineers to do the design and construction, a middle layer of management to handle budgets and administration, and a top level of oversight geeks with veto power, overseeing the whole enterprise. This setup is terrifying to managers - after all, the oversight geeks get a separate budget, and it's impossible (by design) for the managers to exert any pressure on the engineers. Naturally, it's frightfully expensive (unless you factor in the costs of a major disaster every few years), and a serious blow to the ego of your average suit, who believes that God made managers to have ultimate authority. Jack Welch would not approve. It will be interesting to see if NASA can get the funding and muster the self-discipline to pull such a transformation off. If they do, it will be a nice irony, since the Shuttle is the very embodiment of managers promising something engineers can't deliver.
« Things I Have Learned About Foam From the Columbia Accident Investigation Board Report | Three Men and a Container Full of Bath Toys » |
brevity is for the weak
Greatest Hits
The Alameda-Weehawken Burrito TunnelThe story of America's most awesome infrastructure project.
Argentina on Two Steaks A Day
Eating the happiest cows in the world
Scott and Scurvy
Why did 19th century explorers forget the simple cure for scurvy?
No Evidence of Disease
A cancer story with an unfortunate complication.
Controlled Tango Into Terrain
Trying to learn how to dance in Argentina
Dabblers and Blowhards
Calling out Paul Graham for a silly essay about painting
Attacked By Thugs
Warsaw police hijinks
Dating Without Kundera
Practical alternatives to the Slavic Dave Matthews
A Rocket To Nowhere
A Space Shuttle rant
Best Practices For Time Travelers
The story of John Titor, visitor from the future
100 Years Of Turbulence
The Wright Brothers and the harmful effects of patent law
Every Damn Thing
Your Host
Maciej Cegłowski
maciej @ ceglowski.com
Threat
Please ask permission before reprinting full-text posts or I will crush you.