Reliability Concerns

Early on, in-flight repair was a NASA policy for Apollo overall. If the agency was to undertake ambitious space missions in the future, Apollo seemed a good time to start to make the craft ''self-healing.'' Still, it did not lead to elegant solutions, especially for the IL, a laboratory that prided itself on the autonomy of its instruments. In fact, for the IL it highlighted remaining uncertainties around integrated circuits, and questions about design. Reliability would dominate debates about the computer as the program moved away from a prototype in a lab to a real computer on the way to the moon.

The state of the art for inertial guidance systems on military aircraft at the time was something like fifteen hours mean time between failure (MTBF)—that is, the average amount of time between failures was fifteen hours of use. Aircraft just didn't generally fly much longer than that, and missiles were rarely in the air more than a few minutes. Yet to have a great chance of succeeding on a two-week mission, Apollo would require 1,500 hours MTBF—an improvement of two orders of magnitude.16 And that was just for the flight, but did not include the many hours of testing, alignment, and checking prior to launch or intervals between scrubbed launches.

Much of the technology designed into Apollo's guidance had been developed for ballistic missiles, and hence the reliability data that existed related to those types of missions. The short-lived missiles, however, had comparatively low reliability. The air force and navy's solution to these problems was simply to build many more missiles than they needed and assume that some proportion of them would fail in the event of war. For a manned system the country could not afford (politically or financially) to mount several extra missions to ensure one would get to the moon.

For the guidance and navigation system, reliability depended on two major elements: the mechanical systems that ran the inertial platform (the gimbals and gyroscopes) and the reliability of the electronics. Gimbal reliability and the difficulties it generated were discussed in the previous chapter. It is difficult to picture the problems with electronics during its adolescent decades (when was the last time you had an actual, electronic failure, as opposed to electrical connections or software, in your computer?). Vacuum tubes, which were still fairly common, were notoriously failure-prone (and had given all of electronics a bad name), and transistors, which promised better reliability in principle, were still in their infancy and had not yet proven themselves.

IL engineers pointed out to NASA that their Polaris Mark II guidance unit, then in the final stages of development, was meeting its reliability goals. NASA replied that the Polaris system only needed to work for about three minutes at a time, as a missile was fired, whereas the Apollo system would require at least fifteen hours. This estimate assumed that the computer would be turned off when it was not being used. If the computer ran continuously (which it eventually did), it would need to work for nearly two hundred hours.17

Some of these problems stemmed from a technique becoming increasingly fashionable for analyzing large systems: statistical failure analysis. With a new, complex system with so many untried components and techniques, it was not difficult to ''prove'' statistically that it would not work. NASA's stipulation of .99 for mission success, and .999 for safety amounted to looking at the reliability of the individual components, with reliabilities like .9994, and then combining all the probabilities together in ''fault trees,'' or models of how particular failures might compromise the mission. Statistical approaches, with their complex equations and multiple decimal places, had an air of precision that lent them credibility in technical settings, and even more so in nontechnical settings.

While a useful technique for identifying critical components, statistical failure analysis has the drawback of being contingent on the accumulation of numerous low-probability events. It also assumes those numbers are well known, and accounts only for individual component failures, not for failures from interactions between components, or ''system failures.'' Indeed Apollo would experience numerous systems failures ranging from merely annoying to dangerous and fatal, which had not been predicted by the component-failure techniques.18 Debates over statistics versus systems approaches to reliability pervaded the Apollo program, reaching into the lunar module, the command module, and numerous other areas. Furthermore, reliability had direct implications for the human role. If humans in space were to be the ultimate backup system, then who the astronaut was, and what his duties were, depended critically on how engineers approached the question of reliability.

The reliability question was not an academic one for the IL—the entire program was at stake, and IL's managers spent much of their time on the defensive. Members of Congress even wrote to Webb demanding explanations of reliability.19 They questioned the performance of the IL, a group of academics in a game otherwise reserved for major defense contractors, working from a noncompeted sole-source contract. Indeed, IL management was haphazard in the early years: the ICs really did have some reliability problems, and the IL was perennially late releasing drawings to the manufacturers in ways that threatened the flight schedule (at one point the lab began issuing blank pieces of paper so the manufacturers would at least have a drawing number to begin procurement). These delays compounded the software problems explored in the following chapter.

Telescopes Mastery

Telescopes Mastery

Through this ebook, you are going to learn what you will need to know all about the telescopes that can provide a fun and rewarding hobby for you and your family!

Get My Free Ebook


Post a comment