Imagine that you have a coin. If we plan on using this coin to decide something of value to us, such as the order of play in some game, then we are naturally going to be interested in whether this coin is

*fair*. To do so, we need to understand what we mean by the word "fair" here. One way of defining fair may be to say that no one can predict the outcome of a flip of the coin in any way that is more reliable than just guessing. That is, that there is no information that an observer could have which would give them a leg up on coin prediction. This is a notion which we will return to later.

Another way to capture our intuition about fairness, which turns out to be easier to discuss at first, is to imagine that the coin is flipped a very large number of times. We would then demand that a fair coin give approximately as many "heads" as "tails," so that neither player in a coin flipping game is advantaged over the other.

We can formalize this second notion by defining the fraction

*p*of the coin flips which come up heads and then demanding that

*p*= 0.5. We then say that

*p*is the

*probability*of obtaining a heads from our coin flip experiment. If the coin is unfair, then we can imagine

*p*taking on any value from 0 to 1 (there can't very well be less than zero heads, nor more heads than flips of the coin). In either case, however, our notion of probability as being a proportion of experiments having a specific outcome depends on considering very large numbers of trials, such that we can mostly ignore unlikely coincidences such as five heads in a row.

Of course, we needn't restrict ourselves to coins. More generally, we can think of a

*source*as being some device or system that produces a series of

*events*drawn from a list of all possible events. Under this view, the fair coin was a source that produced events from the list {heads, tails}, but we can have any number (well, any number in ℕ, anyway) of possible events in such a list. Then, we will have a

*probability distribution function*, written

*p*(

*x*), that tells us the fraction of experiments in which a source

*X*produces the event

*x*. For example, in the coin example,

*p = p*(heads). Of course, summing this function over all possible events must give probability 1, since in all experiments, something happens. Written in symbols:

Once we have this probability distribution function, we can do some very nifty things. The primary one that I wish to talk about here is how to connect the idea of a probability distribution to the earlier intuition about uncertainty. To do this, we exploit the

*Shannon entropy*-- I won't derive it here, but it turns out that the number of bits required to describe (on average) the outcome of an experiment with a particular source, written

*H*(

*p*), is given by a nice and compact formula:

Here, by lg

*x*, we mean the base-2 logarithm of

*x*, which can be obtained by the change of base formula lg

*x*= log

*x*/ log 2.

A quick check with a calculator tells us that in the fair coin example,

*H*(

*p*) = 1, which we would expect: no observer can do better at recording coin flips than to simply write down each coin flip as it occurs, using a 0 for heads and a 1 for tails (or vice versa). Another quick check tells us that if we have an unfair coin with

*p*(heads) = 0.9, we get

*H*≈ 0.46 bits/flip, meaning that someone who knows that the coin is unfair is much less uncertain of the outcome of a coin flip than someone who doesn't. This example shows a close relation between the concepts of probability, uncertainty and entropy, as we shall see more of in the future.

For now, however, I hope you enjoyed a bit of introduction to and philosophizing about probabilities!

## No comments:

Post a Comment