Info

where the subscript i indicates the values of y and a at the coordinate Xi. In practice one usually calculates the ai from the actual data values yob i because one does not know the true mean.

The theoretical values yth, i can be based on any function that might be appropriate for the data, for example a straight line, yth, i = a + bxi. In our car-counting example above, the independent variable xi would be the time of a measurement. The linear expression reflects the hypothesis that the car numbers are increasing or decreasing linearly with time, or that they are constant at a if b = 0.

A number of straight lines can be drawn on the plot of Fig. 9, and a value for X2 can be calculated for each one. The solid line is an estimated, by eye, fit that attempts to minimize the deviations while giving the most weight to the points with the smallest error bars. This is the "least squares fit". Formally, one would attempt to minimize (21) which gives the most weight to points with small values of ai. The dashed lines in the figure clearly have larger values of x 2 and hence would not be the "best-fit" curve.

The minimization of (21) for a given set of data can be carried out empirically with many trials of different straight lines. For well behaved functions, like our straight line, the minimization can be carried out analytically with calculus. In the minimization process, one is varying the trial theoretical function, which in our example is yth i = a + bxi. The parameters of the theory, a and b, are being varied. When the minimum x 2 is obtained, one knows the best fit values of a and b for the straight line hypothesis.

In astrophysics, one might measure the power from a star at a series of different frequencies and compare it to the theoretical blackbody spectrum which has the form, v 3

where v is the frequency, T the temperature, K the proportionality constant that is dependent in part on the distance to the star (greater distance yields less power), and finally h and k the Planck and Boltzmann constants respectively. The independent variable in this case would be v, and the spectral parameters to be determined are K and T. Minimization of x 2 would yield best fit values of K and T.

Chi square test

A best-fit theoretical curve is not necessarily consistent with the data. When compared to the expected fluctuations in a set of data, the fit may be too bad (high x 2) or too good (low x 2).

One can compare a theoretical function that might underlie the data to the data points themselves to find if in fact the function is consistent with the data taking into account statistical fluctuations. The answer in general will be expressed in terms of probabilities, rather than a flat yes or no. The theoretical function being tested need not be the best fit function, but in most cases it would be.

A common and powerful test for this purpose is known as the chi square test. It makes use directly of the value of x2 calculated from the data and the trial function as given in (21) together with the number of degrees of freedom f. The latter is the number of data points n less the number of variable parameters p in the theoretical function. For our straight line function, the number of such parameters, a and b, is two, and for our blackbody function it is also two, for K and T. If for each case we had n = 20 data points, the number of degrees of freedom would be f = n - p = 18.

Suppose that the theoretical trial function is the true underlying curve that describes the data. In other words, if one made many very precise measurements, the data would faithfully track this curve. Now consider our less than precise set of measurements with an associated value of x 2. One could then ask the question: If the trial function were the true function, and I made another similar set of measurements, what is the probability I would find a greater x2 than this? In other words, what are the chances that the data from the second set would deviate from the theoretical function more than do the set of measurements I already have in hand?

The answer lies in the chi square statistic. If each individual data point yi in subsequent measures is distributed about the true (theoretical) value as a normal distribution with the standard deviation a¡ assigned to it, there is a theoretical basis for calculating this probability. If the probability turns out to lie between 0.1 and 0.9, it would indicate that the fluctuations in our data are comparable to those expected in another set of data, given the a¡ of the several data points. Thus one would consider the data (values and error bars) to be consistent with the trial function. It would indicate that more or less, on average, the points lie within one or at most two, standard deviations of the function, as shown in Fig. 10a.

If the probability turns out to be very low, say, less than ~0.02, it implies that our data have improbably large deviations (large x2) compared to the fluctuations expected from the individual error bars, as shown in Fig. 10b. In this case, we would question whether or not the trial function is the appropriate one; our hypothesis may be false. Alternatively, it could mean that the error bars on each point were underestimated. This would be the case if we had neglected to include, or had underestimated, systematic errors.

If the probability is greater than ~0.98, it would imply that our data clustered too tightly about the theoretical line as in Fig. 10c. This, too, is suspect because the data do not fluctuate as much as basic statistics would require. In our car-counting example, it would be as if we obtained the following counts for each of 5 successive hours: 24,27,25,23,26. The root-mean-square deviation about the mean is only 1.4 when, from Poisson statistics or the normal distribution, we expect a = ^/25 = 5. The fit to our hypothesis of a constant rate is too good!

Telescopes Mastery

Telescopes Mastery

Through this ebook, you are going to learn what you will need to know all about the telescopes that can provide a fun and rewarding hobby for you and your family!

Get My Free Ebook


Post a comment