## Info

The last two columns show the contributions of each arrangement to the probability of success of either playing for the drop or the finesse. The drop is slightly more likely to work than the finesse in this case. Note, however, that this ignores any information gleaned from the auction, which could be crucial. Note also that the probability of the drop and the probability of the finesse do not add up to one. This is because there are situations where both could work or both could fail.

This calculation does not mean that the finesse is never the right tactic. It sometimes has much higher probability than the drop, and is often strongly motivated by information the auction has revealed.

Calculating the odds precisely, however, gets more complicated the more cards are missing from declarer's holding. For those of you too lazy to compute the probabilities, the book On Gambling, by Oswald Jacoby contains tables of the odds for just about any bridge situation you can think of.

Finally on the subject of Bridge, I wanted to mention a fact that many people think is paradoxical but which is really just a more complicated version of the 'three-door' problem I discussed above. Looking at the table shows that the odds of a 1-1 split in spades here are 0.52 :0.48 or 13 : 12. This comes from how many cards are in East and West's hands when the play is attempted. There is a much quicker way of getting this answer than the brute force method I used above. Consider the hand with the spade 2 in it. There are 12 remaining opportunities in that hand that the spade K might fill, but there are 13 available slots for it in the other. The odds on a 1-1 split must therefore be 13 : 12. Now suppose instead of going straight for the trumps, I play off a few winners in the side suits (risking that they might be ruffed, of course). Suppose I lead out three Aces in the three suits other than spades and they all win. Now East and West have only 20 cards between them and by exactly the same reasoning as before, the odds of a 1-1 split have become 10 :9 instead of 13 : 12. Playing out seemingly irrelevant suits has increased the probability of the drop working. Although I have not touched the spades, my assessment of the probability has changed significantly.

I want to end this Chapter with a brief discussion of some more mathematical (as opposed to arithmetical) aspects of probability. I will do this as painlessly as possible using two well-known examples to illustrate the idea of probability distributions and random variables. This requires mathematics that some readers may be unfamiliar with, but it does make some of the examples I use later in the book a little easier to understand.

In the examples I have discussed so far I have applied the idea of probability to discrete events, like the toss of a coin or a ball drawn from an urn. In many problems in statistical science the event boils down to a measurement of something, that is, the numerical value of some variable or other. It might be the temperature at a weather station, the speed of a gas molecule, or the height of a randomly-selected individual. Whatever it is, let us call it X. What one needs for such situations is a formula that supplies the relative probability of the different values X can take. For a start let us assume that X is discrete, that is, that it can only take on specific values. A common example is a variable corresponding to a count (the score on a dice, the number of radioactive decays recorded in a second, and so on). In such cases X is an integer, and the possibility space is {0, 1, 2, ... }. In the case of a dice the set is finite {1, 2, 3, 4, 5, 6} while in other examples it can be the entire set of integers going up to infinity.

The probability distribution, p(x), gives the probability assigned to each value of X. If I write P(X = x) = p(x) it probably looks unnecessarily complicated, but this means that 'the probability of the random variable X taking on the particular numerical value x is given by the mathematical function p(x)'. In cases like this we use the probability laws in a slightly different form. First, the sum over all probabilities must be unity:

If there is such a distribution we can also define the expectation value of X, -E(X) using

The expectation value of any function of X, say f (X), can be obtained by replacing x by f (x) in this formula so that, for example:

A useful measure of the spread of a distribution is the variance, usually expressed as the square of the standard deviation, s, as in s2(X)=E(X2 )-[E(X)]2.

To give a trivial example, consider the probability distribution for the score X obtained on a roll of a dice. Each score has the same probability, so p(x) = 1/6 whatever x is. The formula for the expectation value gives

E(X) = 1 x 1/6 + 2 x 1/6 + 3 x 1/6 + 4 x 1/6 + 5 x 1/6 + 6 x 1/6 = 21/6 = 3.5

Incidentally, I have never really understood why this is called the expectation value of X. You cannot expect to throw 3.5 on a dice—it is impossible! However, it is what is more commonly known as the average, or arithmetic mean. We can also see that

£(X2) = 1 x 1/6 + 22 x 1/6 + 32 x 1/6 + 42 x 1/6 + 52 x 1/6 + 62 x 1/6 = 91/6

This gives the variance as 91/6 - (21/6)2, which is 35/12. The standard deviation works out to be about 1.7. This is a useful thing as it gives a rough measure of the spread of the distribution around the mean. As a rule of thumb, most of the probability lies within about two standard deviations either side of the mean.

Let us consider a better example, and one which is important in a very large range of contexts. It is called the binomial distribution. The situation where it is relevant is when we have a sequence of n independent 'trials' each of which has only two possible outcomes ('success' or 'failure') and a constant probability of 'success' p. Trials like this are usually called Bernoulli trials, after Daniel Bernoulli who is discussed in the next chapter. We ask the question: what is the probability of exactly x successes from the possible n? The answer is the binomial distribution:

You can probably see how this arises. The probability of x consecutive successes is p multiplied by itself x times, or px. The probability of (n — x) successive failures is (1 — p)n x. The last two terms basically therefore tell us the probability that we have exactly x successes (since there must be n — x failures). The combinatorial factor in front takes account of the fact that the ordering of successes and failures does not matter. For small numbers n and x, there is a beautiful way called Pascal's triangle, to construct the combinatorial factors. It is cumbersome to use this for large numbers, but in any case these days one can use a calculator.

The binomial distribution applies, for example, to repeated tosses of a coin, in which case p is taken to be 0.5 for a fair coin. A biased coin might have a different value of p, but as long as the tosses are independent the formula still applies. The binomial distribution also applies to problems involving drawing balls from urns: it works exactly if the balls are replaced in the urn after each draw, but it also applies approximately without replacement, as long as the number of draws is much smaller than the number of balls in the urn. It is a bit tricky to calculate the expectation value of the binomial distribution, but the result is not surprising: E(X) = np. If you toss a fair coin 10 times the expectation value for the number of heads is 10 times 0.5, which is 5. No surprise there. After another bit of maths, the variance of the distribution can also be found. It is np(1 — p).

The binomial distribution drives me insane every four years or so, whenever it is used in opinion polls. Polling organisations generally interview around 1000 individuals drawn from the UK electorate. Let us suppose that there are only two political parties: Labour and the rest. Since the sample is small the conditions of the binomial distribution apply fairly well. Suppose the fraction of the electorate voting Labour is 40%, then the expected number of Labour voters in our sample is 400. But the variance is np(1 — p) = 240. The standard deviation is the square root of this, and is consequently about 15. This means that the likely range of results is about 3% either side of the mean value. The 'term' 'margin of error' is usually used to describe this sampling uncertainty. What it means is that, even if political opinion in the population at large does not change at all the results of a poll of this size can differ by 3% from sample to sample. Of course this does not stop the media from making stupid statements like 'Labour's lead has fallen by 2%'. If the variation is within the margin of error then there is absolutely no evidence that the proportion p has changed at all. Doh!

So far I have only discussed discrete variables. In the physical sciences one is more likely to be dealing with continuous quantities, that is, those where the variable can take any numerical value. Here we have to use a bit of calculus to get the right description: basically, instead of sums we have to use integrals. For a continuous variable, the probability is not located at specific values but is smeared out over the whole possibility space. We therefore use the term probability density to describe this situation. The probability density p(x) is such that the probability that the random variable X takes a value in the range (x,x + dx) is p(x) dx. The density p(x) is therefore not a probability itself, but a probability per unit x. With this definition we can write

The probability that X lies in a certain range, say [a, b], the area under the curve defined by p(x):

Expectation values are defined in an analogous way to the case of discrete variables, but replacing sums with integrals. For example,

I have really included these definitions for completeness. Do not worry too much if you do not know about differential calculus, as I will not be doing anything difficult along these lines. This formalism does however allow me to introduce what is probably the most important distribution in all probability theory. This is the Gaussian distribution, often called the normal distribution. It plays an important role in a whole range of scientific settings. This distribution is described by two parameters: m and s, of which more in a moment. The mathematical form is

ay 2p

but it is only really important to recognize the shape, which is the famous 'Bell Curve' shown in the Figure. The expectation value of X is E[X] = m and the variance is s2.

So why is the Gaussian distribution so important? The answer is found in a beautiful mathematical result called the Central Limit Theorem. This used to be called the 'Law of Frequency of Error', but since it applies to many more useful things than errors I prefer the more modern name, This says, roughly speaking, that if you have a variable, X, which arises from the sum of a large number of

Figure 2 The Normal distribution. The peak of the distribution is at the mean value (m), with about 95% of the probability within 2s on either side m - 2s m - s m m + s m + 2s

Figure 2 The Normal distribution. The peak of the distribution is at the mean value (m), with about 95% of the probability within 2s on either side independent random influences, so that

then whatever the probabilities of each of the separate influences X,, the distribution of X will be close to the Gaussian form. All that is required is that the X, should be independent and there should be a large number of them. Note also that the distribution of the sum of a large number of a independent Gaussian variables is exactly Gaussian. There are an enormous number of situations in the physical and life sciences where some effect is the outcome of a large number of independent causes. Heights of individuals drawn from a population tend to be normally distributed. So do measurement errors in all kinds of experiments. In fact, even the distribution resulting from a very large number of Bernoulli trials tends to this form. In other words, the limiting form of the binomial distribution for a very large n is itself of the Gaussian form, with m replaced by np and s replaced by np(1 — p). This does not mean that everything is Gaussian. There are certainly many situations where the central limit theorem does not apply, but the normal distribution is of fundamental importance across all the sciences. The Central Limit Theorem is also one of the most remarkable things in modern mathematics, showing as it does that the less one knows about the individual causes, the surer one can be of some aspects of the result. I cannot put it any better than Sir Francis Galton:

I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the 'Law of Frequency of Error'. The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshalled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along.

For a good introduction to probability theory, as well as its use in gambling, see:

Haigh, John. (2002). Taking Chances: Winning with Probability, Second Edition,

Oxford University Press. A slightly more technical treatment of similar material is: Packel, Edward. (1981). The Mathematics of Games and Gambling, New

Mathematical Library (Mathematical Association of America). More technically mathematical works for the advanced reader include: Feller, William. (1968). An Introduction to Probability Theory and Its Applications,

Third Edition, John Wiley & Sons. Grimmett, G.R. and Stirzaker, D.R. (1992). Probability and Random Processes,

Oxford University Press. Jaynes, Ed. (2003). Probability Theory : The Logic of Science, Cambridge University Press.

Jeffreys, Sir Harold. (1966). Theory of Probability, Third Edition, Oxford University Press.

Simple applications of probability to statistical analysis can be found in Rowntree, Derek. (1981). Statistics without Tears, Pelican Books. Finally, you must read the funniest book on statistics, once reviewed as 'wildly funny, outrageous, and a splendid piece of blasphemy against the preposterous religion of our time': Huff, Darrell. (1954). How to Lie with Statistics, Penguin Books.