Designing and Fabricating Reliability

In fact, Hall countered the statistics of component failure analysis with an entirely different approach. Statistical analyses, he argued, were only helpful when one already had a large base of experience with a particular technology. Moreover, statistics black-boxed the failures: not all occurred for the same reasons; chips from different companies failed for different reasons under different conditions. Without understanding the processes by which parts were produced and assembled, he claimed, ''the [statistical] technique is highly questionable.''20 The same held true for building redundancy into a system—it had to be done with a deep understanding of how and why the parts would fail, and only under some circumstances would the added reliability offset the added risk from the complexity of redundancy.21 Simply adding more computers was not the answer. You could not calculate, analyze, or graph reliability into a machine. It had to be designed in. The debate went on throughout the Apollo program, and pitted local engineering cultures against the statistical analyses of NASA headquarters.22

Hall and his colleagues proposed another solution entirely, one based on basic engineering—''the fabrication of parts into a system.'' They started by questioning the assumption that underlay the statistics: that failures would be random. Instead, they took the position that there is no such thing as a random failure; rather, failures always occur ''based on cause and effect principles'' (an approach credited with success in other parts of Apollo as well). All failures had a source, and for electronic devices, most of those were ''the result of poor process control or the vendor's lack of complete technical knowledge of his process.'' Reliability was not simply a matter of statistics, but also ''always an integral and basic part of design, or procurement, and of operation,'' best left to the ''judgment and wisdom of the engineers.''23 Key to this approach was standardization—build systems out of the smallest possible numbers of different parts and focus a great deal of effort on improving every aspect of the process of producing them. The technique built on the classical tenets of American manufacturing: economies of scale, detailed control of process, and standardized components. Ultimately, the skill, reliability, and management of the workers on the production line would ensure the Apollo program's success.

Implementing this philosophy required careful control, especially of the industrial suppliers who fabricated the parts. Here the contract structure is revealing: the major guidance components, such as the inertial unit and the accelerometers, were supplied under specific subcontracts. The integrated circuits, which proved as critical as the larger components, were simply bought from vendors. At the outset, nobody realized what a critical role these circuits would play; otherwise they might have been formally contracted, like the gyroscopes or accelerometers. Instead, they were simply purchased.

The only leverage Hall had over the suppliers, then, was as a customer. He hoped that by buying large numbers of a single part he could persuade them to improve their processes. He certified the vendors by prequalification testing of parts. For any batch that the vendors delivered, the parts were screened and burned-in, and then tested for extended-life. If too many components failed, the entire lot was rejected. One test involved immersing the chips in a Freon solution, then meticulously weighing them one by one. If a chip got heavier by more than .00050 grams, then Freon had leaked into it and chip's seal was compromised, forcing a reject.24

Large volumes of standardized parts also lowered prices, which freed up funds for testing, evaluation, and monitoring, and allowed the suppliers to focus on a continuous production flow and constantly improve their process. At Fairchild, one manager reported, ''Apollo really taught us a lot about reliability,'' because workers had to account for every single circuit failure. The company eventually developed separate production lines for Apollo, with workers selected for high motivation and attention to detail. NASA and the IL arranged for the astronauts to regularly visit manufacturing plants and meet the workers who were assembling guidance systems, to impart a direct sense of the importance of quality in their work.25

Human presence in spacecraft did indeed improve reliability, although not always in the ways its advocates intended. Before a single astronaut went into space in an Apollo spacecraft, the very anticipation of their presence improved the machinery. Imposing the criticality of human life into development and manufacturing forced engineers and production workers to emphasize robustness of design, attention to detail, and quality of workmanship.

The IL gradually grew a base of experience and collected data. By March 1966, Block I computers had operated for 66,000 hours, experiencing twelve failures among seventeen computers over the course of three years, making for an MTBF of about 3,000 hours.26 ''In the final analysis,'' Hall wrote in retrospect, ''only successful missions proved the product.''27 After all of the Apollo flights, Hall calculated the MTBF for a computer operating in the command module environment at 50,000 hours.28

Telescopes Mastery

Telescopes Mastery

Through this ebook, you are going to learn what you will need to know all about the telescopes that can provide a fun and rewarding hobby for you and your family!

Get My Free Ebook


Post a comment