Characterizing Messages

Entropy

On the previous slide, we referred to the minimum number of bits that an information source can be compressed into as entropy. Unfortunately, we must admit that this definition is somewhat "sloppy" and difficult to use. Shannon gives a more rigorous definition of entropy. Specifically, entropy per symbol is

[2],

that is, the expectation of log(1/p), where p is the probability of the symbol that is produced by the information source. Alternately,

[3]

where the sum is over all symbols that can be produced by the source. (If the information source is represented as a finite automaton, we may take the weighted average of this expression over all internal states.) If the base of the logarithm is 2, we say that entropy is measured in bits.

Consider an information source. Imagine compressing it into a binary alphabet until it is incompressible. The resultant sequence is then comprised of independently distributed 0's and 1's with equal probability. Thus, this smallest sequence has entropy of 1 bit per symbol, and total entropy equal to its length in bits. It turns out that when entropy is defined according to equation [3], compressing a message leaves the total message entropy unchanged. In other words, the entropy of compressed sequence, and thus its length in bits, is equal to the entropy of the original source sequence. This makes the definition of entropy from the previous slide consistent with equations [2] and [3].