## Info

a0-ak S0-Sk

Corrections to x will continue in this fashion until convergence is achieved.

Convergence and Marquardt's Algorithm. The Gauss-Newton differential correction procedure outlined above may be unsuitable for some nonlinear problems because convergence cannot be guaranteed unless the a priori estimate is close to a minimum in the loss function. Moreover, its rate of convergence can be difficult to control [Melkanoff and Sanada, 1966; Wilde, 1964]. An alternative approach to solving the batch least-squares problem which guarantees convergence is the gradient search method, or method of steepest descent. With this technique, the state parameters are adjusted so that the resultant direction of travel in state space is along the negative gradient of J, i.e., in the direction of steepest descent of the loss function. Although this method initially converges rapidly, it slows down when the solution approaches the vicinity of the minimum.

To overcome both the difficulties of the Gauss-Newton technique when an accurate initial estimate is not available, and the slow convergence problems of the gradient search approach when the solution is close to the loss function minimum, D. W. Marquardt [1963] proposed an algorithm which performs an optimal interpolation between the two techniques. For simplicity, let S0=0, reflecting no confidence in the a priori estimate. Equation (13-29) then shows that

Correction of the state estimate x? in the direction of the negative gradient of J yields the following expression for xjj + 1:

where X is a proportionality constant. The Marquardt technique uses an expression of the form

If X is small, Eq. (13-47) is equivalent to the Gauss-Newton procedure. If X is large, x° is corrected in the direction of the negative gradient of J, but with a magnitude which decreases as X increases.

An example of the use of Marquardt's algorithm for improved convergence is as follows:

1. Compute the loss function using the a priori state estimate, x°.

2. Apply the first state correction to the state to form x° using Eq. (13-47) with X > > GrWG.

3. Recompute the loss function at x° = x°. If ./(x^)>./(x°), then x° is discarded and X is replaced by XK, where K is a fixed positive constant, usually between 1 and 10. The state estimate x° is then recomputed uisng the new value of X in Eq. (13-47). If J(x<t)<J(x°), then x? is retained, but X is replaced by X/K.

4. After each subsequent iteration, compare J(x°k +1) and replace X by XK or X/K as in step 3. The state estimate x° + , is retained if J continues to decrease and discarded if J increases.

This procedure continues until the difference in J between two consecutive iterations is small, or until X reaches a small value. Additional details are given by Marquardt [1963] and Bevington [1969].

Advantages and Disadvantages of Batch Estimators. The major advantage of batch estimators is that they are the simplest to implement; they are also generally less sensitive to bad data points than are the somewhat more sophisticated algorithms described below. Another advantage of batch estimators is that all observation residuals can be seen simultaneously, so that any obviously invalid observations, i.e., those with unusually large residuals, can be removed. An observation is commonly removed if the absolute value of its residual is greater than three times the weighted rms residual.

The computer execution time required for a batch estimator depends on the number of state parameters, the number of observations, the complexity of the state and observation models, and the number of iterations required for convergence. If a large number of iterations is required, a recursive estimator should be considered. The computer storage required to contain the observations for possible future iterations is also a disadvantage of batch estimators; therefore, for applications in which computer storage is limited, recursive estimators or Kalman filters, described in the next section, may be preferred.

Example of a "Single-Frame" Least-Squares Estimator. In the previous example in this section, we wished to determine the constant attitude of a spinning spacecraft based on large number of measurements at different times. In this example we assume that there are several measurements at one time and that we wish to determine the attitude at that time. (This is commonly done when the control and environmental torques are poorly known and it is impossible to predict how the attitude will change between data samples.)

In particular, we wish to compute the three-axis attitude for an Earth-oriented spacecraft with a horizon scanner, a two-axis digital Sun sensor, and a three-axis magnetometer. The attitude is parameterized by pitch roll (£.), and yaw Thus, the state vector is

with the a priori estimate, xA = (0,0,0)T. Note that all observation vector and state vector elements are evaluated at the same time, so that the functional dependence on time is ignored. The seven-component observation vector is y Hp^rm,Hx,Hy,Ht,NA,NBf (13-48)

where pm and rm are the measured pitch and roll angles from the horizon scanner, H = (Hx,Hy,H2)J is the measured magnetic field in spacecraft body coordinates, and NA and NB are the measured Sun sensor reticle counts (see Section 7.1).

To simplify the construction of the seven-component observation model vector, g(x), we define yaw, roll, and pitch as the 3-1-2 Euler angle sequence which rotates a vector from orbital to body coordinates (see Section 12.2). The Sun and magnetic field vectors in orbital coordinates, S0 and Hq, are_ obtained from an ephemeris and magnetic field model. The nadir vector is Eq=(0,0,1)t by the definition of the orbital coordinate system. The Sun, magnetic field, and nadir vectors in body coordinates are

where, from Table E-l, the attitude matrix is cos^cos^, - sin |,,sin |,sin ^ sin ^cos ^ + cos^sin £,sin £p -cos^sin^

cos(j,sin^, + sin£,sin£,cos£p sin^sin^-cos^sin&cos^ cos£rcos^

Substitution of Eq. (13-50) with Eo = (0,0,1)T into the above expression for EB and comparison with Eq. (12-51) (see Section 12.2) gives the first two observation model equations as*

The reason for choosing the 3-1-2 Euler angle sequence is apparent from the simple form of Eq. (13-51). The observation model equations for H are given directly from the second part of Eq. (13-49) as

Finally, the observation model equations for NA and NB are obtained using Eqs. (7-23) through (7-26), with the result ft-lNT^/*^-1); £7 = INT (b/km + 2^) (13-53)

•The sign of ^ and as defined here, is opposite to Qc and fiE defined in Section 12.2.

where b-sy where

Ass is the transformation matrix from Sun sensor to body coordinates (see Eq. (7-9), and m, km, h, and n are sensor constants defined in Table 7-2.

The partial derivatives of the observation model vector elements, g„ with respect to the state vector elements, xjt are

where

The observation weights are taken to be the inverse of the corresponding variance. For pitch and roll data, the weights are

and for the magnetometer data, the three observation weights are assumed to be equal and given by

The errors in the Sun sensor data are assumed to be dominated by the step size (see Section 12.3); hence, m>6= W7= 12 (13-57)

Note that the weights have the same units as the square of the corresponding inverse measurement. The elements of the weighted matrix, W, are given by-

assuming that the measurement errors are uncorrelated,

With the above definitions, we now wish to find the state vector estimate, x, which minimizes the loss function defined by Eq. (13-29) with S0 = 0(to indicate no confidence in the a priori solution). The solution is given by Eq. (13-36) with the a priori solution Xq^x^ as

where Gk is given by Eqs. (l3-30b) and (13-54) and the subscript k denotes that the partial derivatives are evaluated at x = xA. The covariance matrix of the computed state vector (see Section 12.3),

is obtained as a byproduct of the differential correction algorithm.

13.5 Recursive Least-Squares Estimators and Kalman Filters

Lawrence Fallon, III

13.5.1 Recursive Least-Squares Estimation

Consider an «-component observation vector, y, which is partitioned into p members; that is y=

Each member contains q observations which are generally measured at nearly the same time. For example, consider an observation vector which contains n*= 100 star tracker measurements obtained in a 30-minute interval. If y consists of angular coordinates which are measured two at a time, then it would be partitioned into p = 50 members, each containing q = 2 components. The observation model vector, g, is also partitioned in the same manner as y.

A batch least-squares estimate of the m-component state vector x°, determined with observations from only member y,, will be denoted x°. The state estimate determined using observations from both members y, and y2 will be denoted x°, and so forth. We want an expression for x2 using x° and observations from member y2. This will then be extended to form an expression for x°, using xJJ_, and the observations from member yk.

The loss function in Eq. (13-29) leads to the following relation for x°:

[50+CIT|IV1CIAJx?=S0x2 + C^i^I[yI-glili + c;IRx®i] (13-62)

which is equivalent to Eq. (13-33), except that W„ GlRt, and contain observations from member y, only. The subscript Rx signifies evaluation at reference state vector x^. (We will use a different reference state vector for each of thep members of g.) Similarly, for x2,

where the subscript S means that observations from both members y, and y2 are included. The 2^-vectors ys and are defined by hSR:

'I *,**, where x^ is a reference state vector which is generally different from x^. The 2q-vector gSR, the (2q X m) matrix GSR, and the (2qx2q) matrix IVS are analogous to ys and b5Jt.

where the (m X m) matrices SlR and S2R are defined by

We emphasize that the (qXq) matrix fV2, the (q X tri) matrix G2R^ and the ^-vector gZRj pertain to observations from y2 only. Solving Eq. (13-64) for x2 yields x°=x?+ PAjV^y 2-g2R+ G^x»,-*?)] (13-65a) where the (m X m) matrix P2 is given by

P2=S2Ri = [/>,-' + GjRW2G2Ri] -' (13-65b) and the (m X m) matrix Pv

Equation (13-65) is an expression for x2 which depends on x° Px (the covariance of error in x?), and quantities associated with observations from member y2. Once x° and Pt have been calculated, observations from y, are no longer necessary for the estimation of x2.

By analogy, the expression for x°, the state estimate using the first k members of the observation vector, is

In many sequential least-squares applications, the estimate derived from processing the previous observations becomes the reference vector for the current estimation, i.e., x^ =xj_,. In such cases, the state estimation is frequently not iterated in the batch least-squares sense. The state estimate "improves" as additional data are processed. Occasionally, however, the same reference vector will be used for the entire group of data, and iterations may or may not be used.

13.5 recursive least-squares estimators and kalman filters 461

The algorithm given by Eq. (13-66) requires the inversion of the (mXm) matrix SkR at each step to compute Pk. This algorithm may be transformed into one which requires inverting a (qXq) matrix. Recall that q is the dimension of the members of y. which are generally smaller than the m-dimensional state vector. Applying the matrix identity

to Eq. (13-65b) and extending the result to the Acth estimate of P yields

where Rk, the (q X q) measurement covariance matrix, is the inverse of Wk. Substitution of Eq. (13-67) into Eq. (13-66) and some matrix manipulation yields

If Eq. (13-69) is now substituted into Eq. (13-67), we obtain the following algorithm for Pk, the error covariance matrix of the state estimate, x°:

Equations (13-68) to (13-70) are the basic equations used in sequential least-squares estimators. In these equations, the ^-vector gk and the (qXm) matrix Gk are the parameters associated with observation k, evaluated at x°=x°Rk. If x^ = **-1. ^en Eq. (13-68) reduces to

The (qXq) matrix to be inverted in Eq. (13-69) thus has the dimensions of yk. In a common application of this algorithm, the observations are processed one at a time. In this case, q*=\ and no matrix inversion is necessary.

Because of computer roundoff errors, Pk can become nonpositive definite and therefore meaningless. An alternative is to use the Joseph algorithm for the computation of Pk:

Pk = [ 1 - Kk Gk ] Pk _, [ 1 - Kk Gk ]T + Kk WjT lKk (13-72)

This algorithm requires more computation than Eq. (13-70), but ensures that Pk will remain positive definite. Substituting Eq. (13-71) into (13-72) reproduces Eq. (13-70), which indicates that the two methods are analytically equivalent for any K defined by Eq. (13-69). Figure 13-3 summarizes the procedures used by a recursive least-squares algorithm.

Advantages and Disadvantages of Recursive Least-Squares Estimators. The principal computational advantage of recursive least-squares estimators occurs in applications where iterations are not required. In these cases, the estimator converges (i.e., the difference between x° and x°_, approaches zero) as additional data

Fig. 13-3. Block Diagram of Recursive Least-Squares Estimator Algorithm

are processed. This results from the use of xj_, as the reference state vector for the estimation of xj. Only information pertaining to the fcth set of observations must be stored in these cases. Use of a recursive estimator instead of a batch estimator (in which the reference state vector is not replaced until all observations have been processed) will thus result in a reduction of computer storage requirements and a decrease in execution time. The principal disadvantage of the recursive least-squares estimator is that it is more sensitive to bad data, particularly at the beginning of a pass, than is the batch estimator.

If the state undergoes minor unmodeled variation during the time spanned by the observations, the recursive least-squares estimator will calculate a weighted "average" value for x° which is essentially equivalent to the value estimated by a batch procedure. In contrast, the Kalman filter described below will generally track state variations better than either the recursive or batch least-squares algorithms. 13.5.2 Kalman Filters

To estimate the value of a state vector at an arbitrary time, tk, the state estimate at t0, from a batch or recursive algorithm, must be propagated from t0 to tk using a model of the system dynamics. The Kalman filter, on the other hand, estimates the m-component state vector x(/A) directly based on all observations up to and including yk and the dynamics model evaluated between observations.*

* This subsection describes a continuous-discrete Kalman filter which assumes that the system dynamics varies continuously with time and that observations are available at discrete time points. The two other classes are continuous and discrete Kalman filters, in which both the system dynamics and observation availability are either continuous or discrete. The continuous-discrete filter is the most common for spacecraft attitude determination. Much of the development of these filters was done by R. E. Kalman [1960, 1961] in the early 1960s. The basic Kalman filter in each of these classes assumes that the observation models and system dynamics are'linear. What we describe is actually an extended Kalman filter because nonlinear observation models win be allowed.

Although all filters require a dynamics model (the simplest of which is x= constant) to propagate the state estimate between observations, the accuracy requirements for this model are normally less severe for the Kaiman filter than for batch or recursive estimators because propagation is not performed at one time over the entire block of data. In addition, the Kaiman filter compensates for dynamics model inaccuracy by incorporating a noise term which gives the filter a fading memory—that is, each observation has a gradually diminishing effect on future state estimates.

Each time a set of q observations, yk, is obtained, the Kaiman filter uses it to update the a priori state vector estimate at ik, denoted by xk_ ,(;*), to produce an a posteriori estimate xk(tk). It also converts the a priori error covariance matrix estimate, |('*)> into the a posteriori estimate, Pk(tk). These a posteriori estimates are then propogated to tk+x to become the a priori estimates xk(tk+l) and /¿(/k+1) for the next observation set, yi+I(/j.+ 1). The subscript k on x and P indicates that the estimate is based on all observations up to and including the observations in yk.

The updating equations for the Kaiman filter are the same as those for the recursive least-squares estimator except that we are now estimating the state vector and covariance matrix values at the time tk rather than at a fixed epoch Iq. Thus, the Kaiman filter update equations are

**('*)=**-i('*)+**|>*-8*] with the (m X q) gain matrix

and either

or, alternatively,

Pk(tk) = [ 1 - A, Gk]Pk. ,(/*)[ 1 -KkGk]T+KkRkK? (13-76)

for the (m X m) error covariance matrix. In these equations, the ^-vector gk and the (q~xm) matrix Gk are evaluated at xt_,(tk).

In some cases it is necessary to iterate the estimate of xk(tk) to reduce the effects of nonlinearities in the observation model. If this occurs, then %k and Gk will be evaluated about a reference vector x^(/A), which may be different from xk-\(tk). Equation (13-73) is then replaced by the more general form

Iteration may then be done using Eq. (13-77) with xRt(lk)=kk_t(lk) to estimate xk(tk). The operation is cyclically repeated using xÄjt(/A)=&*(/*) and so on, until the change in the xk(t) between successive iterations is negligible. Jazwinski [1970] provides additional information concerning local iteration techniques. In attitude determination, the time between observations is normally short and local iteration is generally not needed. Thus, Eq. (13-73) is the more common expression. Additional techniques for nonlinear problems' are discussed by Äthans, et al., [1968].

Propagation of x and P Between Observation Times. The Kalman filter assumes that the system dynamics is linear and of the form

where F, B, and N are known matrices which may be time varying and the matrix F has dimensions (m X m). The dimensions of B and N are such that the terms Bu and Nu are m-vectors.* The vector u is a known, deterministic driving function which does not depend on x. For example, u could consist, of attitude-independent environment or control torques. In many attitude determination problems, u is zero and the term Bu may thus be ignored. The vector n is zero mean white noise which is assumed to be uncorrected with the observation noises, v„; that is

for all /, t, and k, where V is a known, symmetric, nonnegative definite matrix; SD(t - t) is the Dirac delta; and E denotes the expectation value. It is the matrix V which is selected to give the filter its desired fading memory characteristics. White noise processes such as n(/) do not exist in nature. The term Nn in Eq. (13-78) is included more as compensation for imperfections in the dynamics model than as a literal approximation of actual system inputs.

The Kalman filter propagates the state vector estimate via Eq. (13-78) with the noise term omitted; that is,

The solution to this equation is [Wilberg, 1971]

where the m-vector r is given by r (/,«*) = f/>(/,t)B(t)u(t)rfr (13-82)

Implementation of these two equations requires that the state transition matrix, D, be determined. When F is time invariant, t> is the exponential of (mXrn) matrix F; that is,

and when F is time varying, D is obtained by integrating the following matrix equation,'either analytically or numerically:

•The dimensions of vectors o and n will depend upon the nature of the filtering application. They are frequently of different dimensions than the state vector.

In many applications the propagated state vector is updated at each observation set, yk, and the propagation computations are restarted using the a posteriori estimate xk(tk). Thus, Eq. (13-81) takes the form

If the function u is zero, then the propagated state vector is given by the simpler equation

We now develop an expression for propagating P, which uses D, N, and V. Recall that the (mXm) matrix P represents the covariance of errors in the state estimate; that is,

where e(/)=x(/)-x(f), and it is assumed that E(e(/))=0. Differentiating this expression with respect to time, using Eqs. (13-78) and (13-79), and performing some algebra yields the matrix Riccati equation,

which has the solution [Wilberg, 1971; Meditch, 1967]

A more explicit form of Eq. (13-88) for propagating P between observations yk and yk + l is

Pk{tk+x)=D{tk + x, tk)Pk (tk)D\tk+ „ tk) +Q(tk+l,tk) (13-90)

Q(tk+i,tk) is called the state noise covariance matrix* A block diagram of the Kalman filter algorithm is given in Fig. 13-4.

Example of x and P Propagation. As an example of state and error covariance propagation in a Kalman filter, we describe a simplified version of an algorithm employed for attitude determination on the ATS-6 spacecraft. This

* When the system dynamics is nonlinear, the state propagation is commonly performed by integrating an equation of the form with the initial condition x(/t)=x(it), where f is a function which is nonlinear in x. Additional techniques for propagation of attitude parameters are given in Section 17.1. The state transition matrix to be used in the propagation of P is calculated using Eq. (13-84) with

This linear approximation causes the resulting estimate for P to be only a first-order approximation to the true covariance matrix.

0 0