Characterizing Messages

Redundancy and Compression

We need a way to measure how much information an information source produces. It is not enough to simply count the number of possible sequences, because 1. some sequences are more likely than others and 2. messages contain redundancy. This means that only part of the length of a message is attributable to information, the rest is a result of the structure of the message itself.

As an example, consider English again. Even when a large number of letters are deleted from English text, the original message can still be reconstructed. This is, in fact, an unsophisticated way to compress English. Compression is the act of translating a message into a shorter message from which the original can still be reconstructed.

In fact, Shannon demonstrates that there is a precise limit to how much an information source can be compressed. The minimum number of bits needed to fully describe a message is known as entropy. Imagine that we have a compression machine that translates the output of a source into the binary alphabet. Then consider the average length of the binary sequences that are produced. This is the number of bits that the source is being compressed into. The minimum number of these bits is the source entropy.(1)

For simplicity, we continue to work with the familiar binary alphabets: {A, B} for the source and the destination, and {0,1} for the channel in between. In this case, Shannon shows that the only information source that cannot be compressed is that in which successive symbols are chosen fully independently and equal A or B with equal probability. Because of the lack of structure, a typical message from this source is already its most efficient description. Such a source then has an entropy of 1 bit per symbol.

This is no longer true if symbols are not mutually independent. As a trivial example, say that every even numbered symbol is equal to the previous symbol, as in BBAAAABBAA... Then every other symbol could be deleted without losing information. The entropy of such a source is not greater than ½ bit per symbol.

1. Technically, we must average the number of bits needed to describe typical messages of length N as N grows without bound.