Square Root for the Orbit Problem


In the previous chapter, the solution to the least squares estimation problem including a priori information, P and x, is represented in the normal equation form as

which can be expressed as

The solution for x is obtained by computing the inverse of M. In practice, computational problems are encountered in forming and inverting the matrix M = HT W H + P . An orthogonal transformation approach can be used to write Eq. (5.1.2) in the form

where R is an n x n upper triangular matrix (R is not the observation error covari-ance matrix). Then x can be obtained by backward substitution. This approach has the advantage that increased numerical accuracy is achieved over conventional matrix inversion methods for solving Eq. (5.1.2). Using the orthogonal transformation approach, accuracy can be achieved with a single-precision computation that is equal to the accuracy obtained by inverting M with the normal equation approach using double-precision computations.

Solution Methods Determination

The normal equation approach has several operational and conceptual advantages that have led to the widespread adoption of this technique for many operational orbit determination systems. Both the normal equation and orthogonal transformation approaches are described in the subsequent sections. The solution of the linear system

is expressed as

where the operation M-i implies that the inverse of the (n x n) matrix M is computed and then postmultiplied by the column vector N. A solution based on the Cholesky decomposition of M is more efficient and, in most cases, more accurate. The Cholesky decomposition is applicable only if M is symmetric and positive defhite, a condition satisfed for the case considered here. The following discussion fist outlines the Cholesky decomposition, and describes this approach to solving the normal equations. Then a discussion of solutions based on orthogonal transformations is presented in subsequent sections.


Let M be a symmetric positive defnite matrix, and let R be an upper triangular matrix computed such that

(Note that R is not the observation error covariance matrix.) Then Eq. (5.1.4) can be expressed as

where RT is lower triangular. The components of z can be determined using a forward recursion relation. Then, Eq. (5.2.3) can be solved using a backward recursion to obtain the elements of X.

The elements of the error covariance matrix, P = (HTWH + P )_i = M-i , can be obtained from the condition

where S, the inverse of the upper triangular matrix, R, can be computed by an efficient backward recursion.

Equation (5.2.1) represents a set of (n2 + n)/2 equations for the (n2 + n)/2 unknown elements of the upper triangular matrix, R. The expression for the Cholesky decomposition of M is obtained by expanding Eq. (5.2.1) and solving term by term for the elements of R (e.g., Rj is determined in terms of the elements of M).

Given the elements of the n x n positive definit e matrix M and the n x 1 column vector N, the following Cholesky algorithm will yield a solution for the elements of R, z, X and the upper triangular matrix, S. Step 1 of the following algorithm determines the elements of R and the vector z. Steps 2 and 3 perform a backward recursion to form X and the elements of the matrix S = R-1.

5.2.1 The Cholesky Algorithm

The Cholesky algorithm for R is derived easily by equating the elements of RtR to the elements of M that are known. For example, from expanding RTR it is shown that r11 = a/M11.

The elements of z are obtained from an expansion of Eq. (5.2.4):

The elements of S are obtained from an expansion of SR = I.

Examples of the application of this algorithm are given in Sections 5.6.1 and 5.6.5.

This Cholesky algorithm is a nonblock algorithm. That is, it does not use matrix multiplication. Because matrix multiplication is much faster in terms of floating point operations per second than matrix-vector operations on modern computers, a block Cholesky algorithm often will be faster than a nonblock version. In fact, the increase in speed may be a factor of three or more. See Golub and Van Loan (1996) for examples of a block Cholesky algorithm.

5.2.2 The Square Root Free Cholesky Algorithm

Notice that calculation of the diagonal elements of R requires that n square roots be taken. Computationally the square root operation is expensive compared to multiplication, division, addition, or subtraction; hence, it is desirable to avoid square roots if possible. A square root free Cholesky algorithm may be developed by defiling

where U is unit upper triangular (i.e., has ones on the diagonal), and D is a diagonal matrix. As in the previous section, Eq. (5.2.10) represents the (n2 + n)/2 equation in the (n2 + n)/2 unknown elements of U and D. The algorithm for the elements of U and D is obtained by expanding Eq. (5.2.10) and is given by n

The procedure is to set j = n and cycle through the algorithm for i = 1... n — 1, solving for Dn and the elements of Uin (i.e., the last column of U). Then set j = n — 1 and cycle through for i = 1... n — 2, solving for Dn-1 and the n — 1 column of U. Repeat this procedure for the remaining values of j.

Knowing U and —, it is possible to solve for X from Eq. (5.1.2) and Eq. (5.2.10) as follows. Note that


The intermediate vector z can be determined from Eq. (5.2.15). The solution is a backward substitution (i.e., we solve for zn, zn-i - --zi):

Then the elements of X can be determined from Eq. (5.2.14) by using a forward substitution, as i-i

The associated estimation error covariance is obtained from


Note that none of the algorithms required to compute X involve the calculation of a square root.


An alternate approach that avoids some of the numerical problems encountered in the normal equation approach is described in the following discussion. The method obtains the solution by applying successive orthogonal transformations to the information array, (H, y). Enhanced numerical accuracy is obtained by this approach. Consider the quadratic performance index, J(x), which minimizes the weighted sum of squares of the observation errors, e = y — Hx (for the moment we will assume no a priori information; i.e., P =0, x = 0):

J(x) = eTWe = W2 (Hx — y) = (Hx — y)TW(Hx — y). (5.3.1)

If W is not diagonal, W1/2 can be computed by the Cholesky decomposition. Or the prewhitening transformation described at the end of Section 5.7.1 can be applied so that W = I. For notational convenience we are using —e in Eq. (5.3.1).

The solution to the least squares estimation problem (as well as the minimum variance and the maximum likelihood estimation problem, under certain restrictions) is obtained by finding the value x that minimizes the performance index J(x). To achieve the minimum value of J(x), we introduce the m x m orthogonal matrix, Q. An orthogonal matrix has the following properties:

3. If Q1 and Q2 are orthogonal matrices, then so is Q1Q2.

4. For any vector x,

Multiplying by Q does not change the Euclidean norm of a vector.

5. If e is an m vector of random variables with e ~ (O, I) (i.e., E(e) = 0 and E(eeT) = I), then e = Qe has the same properties,

It follows then that (5.3.1) can be expressed as J(x) = |QW 2 (Hx — y)

Select Q such that

and define QW1 y =


R is a n x n upper-triangular matrix of rank n

O is a (m — n)x n null matrix b is a n x 1 column vector e is a (m — n)x 1 column vector.

The results given by Eq. (5.3.6) assume that m > n and H is of rank n. Using Eq. (5.3.6), Eq. (5.3.5) can be written as

Expanding leads to


Rx — b


Only the fist term in Eq. (5.3.8) is a function of x, so the value of x that minimizes J(x) is obtained by requiring that

and the minimum value of the performance index becomes (equating J(x) in Eq. (5.3.1) and Eq. (5.3.8))

J(X) =



That is, ||e|| is the norm of the observation residual vector, which for a linear system will be equal to the weighted sum of the squares of observation residuals determined by using X in Eq. (5.3.10).


The procedure described in the previous section is direct and for implementation requires only that a convenient procedure for computing Q be obtained. One such procedure can be developed based on the Givens plane rotation (Givens, 1958). Let x be a 2 x 1 vector having components xT = [xi x2] and let G be a 2 x 2 orthogonal matrix associated with the plane rotation through the angle 0. Then select G such that

Gx = x' = To this end, consider the transformation


cos 6 sin 6



— sin 6 cos 6


x i


cos 6x1 + sin 6x2


= —

sin 6x1 + cos 6x2.

Equations (5.4.3) represent a system of two equations in three unknowns; that is, xi, x2, and 0. The Givens rotation is defiled by selecting the rotation 0 such that x2 = 0. That is, let xi = cos 6x1 + sin 0x2 0 = - sin 6x1 + cos 6x2.

From Eq. (5.4.5), it follows that tan 0 = —, sin 0 =

xi x2


The positive value associated with the square root operation is selected for the following discussion. Substituting the expression for sin 0 and cos 0 into Eq. (5.4.4) leads to

Consider the application of the transformation

to two general row vectors hi and hk; for example, hii hii+1 - - -

That is, for any two general row vectors, h and hk, the transformation is applied to the first column so as to null hki. The transformation that accomplishes this is applied to each remaining column to obtain the transformed matrix. Hence,

cos 8 sin 8

h hii


sin 8 cos 8



Then for all other columns, hij = hij cos 8 + sin 8

By using this transformation repetitively as k goes from i + 1 to m, the remaining elements of the ith column can be nulled. Then by moving down the diagonal and applying the transformation to successive columns whose first element lies on the diagonal, a rank n matrix can be reduced to an upper triangular n x n matrix with a lower (m — n)x n null matrix. If the element to be nulled already has a value of zero, the transformation matrix will be the identity matrix and the corresponding transformation may be skipped.

As an example, the transformation to null the fourth element in the third column is shown as follows:



hi3 ■

■ hin




h-23 ■

■ h2n




h33 ■

■ h3„




h-43 ■

■ h4„




h-53 ■

■ h-5„




h63 ■






■ h


The prime superscript identifies the two rows that are affected by this transformation (rows three and four). Notice that the location of — sin 0 in the Givens transformation matrix corresponds to the location of the element to be nulled in the H matrix. For example, in the previous example both are element (4,3).

By using the transformation q5'3 =

the third and fifth rows will be transformed so that the term fr.53 will be zero.

5.4.1 A Priori Information and Initialization

The formulation given earlier does not specifically address the question of a priori information. Assume a priori information, x and P, are available. The procedure is initialized by writing the a priori information in the form of a data equation; that is, in the form of y = Hx + e. This is accomplished by writing x = x + n

where x is the true value and n is the error in x. We assume that

Compute S, the upper triangular square root of P, p = SST.

If P is not diagonal, the Cholesky decomposition may be used to accomplish this. Next compute R, the square root of the a priori information matrix, A, hence,

Multiplying Eq. (5.4.15) by R yields

Deftie then

0 0

Post a comment