Chasing the Problem

Moments after the touchdown, the phone at the IL began ringing like a 1202 Program Alarm. NASA was calling and wanted to know what went wrong, demanding an explanation and a fix before the LM lifted off the surface in a few hours. The IL engineers understood that their computer was not operating at full capacity, but they did not understand why.

They went to their simulators, in ''a frantic session,'' trying to recreate the problem. ''We worked all night and time was running short.'' Fred Martin recalled, ''Our NASA buddies called us every 15-30 minutes anticipating, demanding a solution. We had to find it. We re-covered old ground, new ground, brainstorms, crazy ideas, anything.''17 Finally, George Silver, an IL engineer with a great deal of experience in the LM simulators, arrived at the lab. He had monitored the landing at home, heard the alarms, and rushed into work to point out that he'd seen this problem caused by the rendezvous radar during a simulation when the radar was on during a landing and in the ''AUTO'' position.

Fred Martin ran upstairs and pored through the telemetry data with his engineers— sure enough, the radar was on during the landing, though the IL thought it should have been off. It had to have been turned on by an astronaut, but why would he do that? The rendezvous radar was not used during landing. Checking the procedures, it turned out that the astronauts were indeed following the procedure—it was in their instructions (actually, Aldrin had put the switch in SLEW position, when it should have been in AUTO, but this would have made no difference to the problem). They had learned to do it this way in the procedures trainer. But in that simulator the switch was just a dummy, not connected to anything. In the real vehicle, it had different effects.

By now Armstrong and Aldrin had explored the moon for a few hours, and their ascent countdown was already underway. The IL phoned NASA and asked them to call the LM and ask for the rendezvous radar switch to be placed in the LGC position before liftoff. The problem was solved and the program alarms did not recur.

Why did the procedures specify the rendezvous radar, used in the ascent from the moon, to be turned on during descent? Some time before landing Aldrin asked the IL engineers if he could leave the rendezvous radar on during the landing, so that it would already be running if there was an abort and they needed to return to the CSM. IL engineer Larsen approved this step and changed the checklist. Aldrin, of course, was a rendezvous expert and he wanted to be prepared in case of an abort.

But hidden in the computer's interface to the rendezvous radar lay a problem.18 The radar had three modes: SLEW, AUTO, and LGC. In the first two modes, the crew operated the antenna. In SLEW, they could manually direct it, and then switch to AUTO to automatically track the signal on the CSM during rendezvous. These modes operated separately from the guidance computer and displayed their data on the cockpit displays. In LGC mode, the data was provided to the software, which incorporated range and range rate as well as antenna angles into the calculations for the rendezvous guidance. Crew procedures called for the switch to be in the AUTO position during the landings, which would hold it still. Neither mode should have had any impact on the computer.

The trouble was that the rendezvous radar and the rest of the guidance system had different electrical power supplies. They both ran on alternating current (AC) of the same frequency, but had different phases (i.e., their alternating sine waves were out of sync). When the change in the switch procedure was tested in the lab, technicians connected both to the same power supply, which caused them to run in phase, even though they would be out of phase in the spacecraft. According to George Silver via Don Eyles, the problem had been recognized early on but never corrected.

On Apollo 11, the power supplies on the LM fell into a particularly unfortunate phase angle. Hence the computer and the radar were not in sync, causing the angle counters on the rendezvous radar to constantly increment or decrement in response to random electrical noise, sending nearly the maximum rate of data to the computer. The computer struggled to increment or decrement its counters for tracking the radar angles, which used up about 15 percent of its processing time.19

The computer had been designed with 15 percent ''overhead'' in processing power. That is, with all the processes running full blast, the computer would be at 85 percent capacity. But the rendezvous radar was generating so much spurious data that it ate up more than this 15 percent, causing the computer to overload.

Fortunately, the computer had a graceful way of responding to this situation. What the computer did next was not a bug in the program, but a manifestation of robustness in the software design. IL engineers were very proud of their ''asynchronous executive,'' and when the overloads came up, this feature allowed the computer to drop low-priority tasks, meaning that basic housekeeping tasks and the DSKY display were the first to go. Indeed, Aldrin's request for Noun 68 (DELTAH) was dropped by the computer and the display returned to P63. The display froze up for short periods as well. Still, the mission-critical items—guidance equations, throttle control, attitude servos—kept running, which was why Armstrong could still feel the machine responding to his inputs.

In response to these overflows, not only was the computer generating alarms, but it was also restarting. Fortunately, these were not the cumbersome reboots required by today's desktop computers. Rather, ''restart protection,'' the difficult new requirement imposed on the software team in 1968, was allowing the computer to restart nearly instantaneously. When the 1201 and 1202 alarms came up, the computer called a BAILOUT subroutine (ironic for a craft without parachutes), and simply restarted itself. Because of its restart protection, the computer could flush incomplete and lower-priority jobs and pick up right where it had left off. Armstrong could not even feel the hiccups.

When the computer shifted into P64 as the LM pitched over, the computer's capacity margin became even more critical, and hence the alarms continued and even increased in frequency. When Armstrong switched into P66 and took over manual control, the computing load lightened, because the computer was no longer calculating the landing point, and the alarms disappeared.

These explanations all developed in analysis after the landing. How, then, did the ground crews know to make their snap decision not to call an abort?

In the months before the Apollo 11 mission, the crew had rehearsed the landing process from the LM simulator, in contact with the flight controllers and numerous other aspects of the network. As the basic scripts were perfected, engineers began inserting a series of unlikely events to test reactions. Jack Garman, a young engineer on the ground, helped develop computer errors that would probably never happen. These included a program alarm reflecting an overload on the LM computer.

During one such simulation, just a month or two before Apollo 11's launch, flight controller Steve Bales called an abort in response to a program alarm, even though a landing could have proceeded successfully. The young controllers weren't too troubled by the incident, just one mistake among many. But NASA management was concerned, for ''it scared everybody to death.'' Calling a mistaken abort was almost as bad as missing a real one. As Garmin recalls, ''Kranz called a meeting [to] go through every program alarm, write down what could happen, what we should do about it,'' which the controllers subsequently did.20

Jack Garman made himself a handwritten ''cheat sheet,'' which he kept at his control console in the back room under a piece of plexiglass. On the left side was a list of program errors, on the right side a series of problems and possible responses. In the section of his chart corresponding to ''1201-1211 PGNCS,'' the alarm that appeared on Apollo 11, he wrote the following notes in the right hand column:

PGNCS condition unknown, DKSY may be locked up, duty cycle may be up to point of missing some functions (nav. last to die) switch to AGS (follow ERR needles) may help (reduces PGNCS duty cycle signif.).

In these few words, Garman had the critical information that would allow him to diagnose the problem in real-time (figure 9.3). This was all news to Aldrin: ''I was the kind of the systems guy in the LM and I was not made aware of that. And it seems as

TÏPt

Cii-erirtjf'

ptn-u*. ftf+burr

UK urn-prflfi ttuVXHTÏ.TJ/ DIJ&S aUKC-Mfrh*

ft ft

J

J Cunn4«T tufelj

Íft LB Mm)

1 £0«nWu»iG- O* J*t«ÍTWCa «f : <?|ie1 iKutfKKtA* OI7DL t1Ktap£*M) «|3lJt ti ht M/lVü

01311 HKfcf.#Mr OlDd« MW»Ff

Bfl'i^wr i> h ■i •i ti

Ovrf cynit (mrt jtri.fi' frritfy»} (pt™* PM ,srwi* ivn. ng^is cmfiiHi'^ij Dtlff MffU UJttCftlfi turfcrttt m*í ut

Ht ftujjf *f annzit nt

iiwf w *rrr

¿éwWt VUoft' mi Mi Uh'Jylwiri tl fHtl , rMwr «rA,

Tt^f^D r- mi it H 1

Figure 9.3

Jack Garman's ''cheat sheet'' for Apollo 11 landing; shaded section contains the 1201 and 1202 Program Alarms that Garman diagnosed in real time with reference to these instructions. (Courtesy of Jack Garman.)

Figure 9.3

Jack Garman's ''cheat sheet'' for Apollo 11 landing; shaded section contains the 1201 and 1202 Program Alarms that Garman diagnosed in real time with reference to these instructions. (Courtesy of Jack Garman.)

though that was a flaw in communications. I was very much in the dark when this came up.''21

Telescopes Mastery

Telescopes Mastery

Through this ebook, you are going to learn what you will need to know all about the telescopes that can provide a fun and rewarding hobby for you and your family!

Get My Free Ebook


Post a comment