Characterizing Messages

Information Sources

In the previous examples, we took the inputs to the channel as given. We must now expand our view to include the source of information. Once we formalize an information source, we will consider the question of how to match an information source to an information channel.

An information source can take many forms. It can be a human (speaking or maybe typing), a stock market, a probabilistic quantum effect, and so on. Shannon's results, however, only hold for a certain subset of information sources. The source must choose symbols one at a time according to fixed probabilities, depending on preceding choices and current symbols (We could envision a source to be a probabilistic finite-state automaton). We further require that all possible sequences of symbols produced by the source be statistically indistinguishable from each other with probability 1. This assures us that any given sequence is typical of all sequences. We say that a source satisfying this last property is ergodic.

At first glance, this restriction appears limiting. For instance, a human speaking English may not be an ergodic source of information. However, Shannon suggests that most real-life sources of information can be approximated by made-up ergodic finite-state automata. This makes Shannon's theorems useful for modeling real world communication

Shannon gives an example of how to construct an ergodic finite-state automaton that approximates English speech or writing:
First, we could simply output random letters with equal probability. This is the so-called zero-order approximation. When reading zero-order output, the unnatural abundance of X's and other letters immediately jumps out. This suggests that a better approximation might be made by considering the relative frequency of letters. This leads to the first-order approximation. Here, letters are chosen independently, but with probability equal to their relative frequency in English. A still better method is to consider the probability of each letter given the previous letter. This preserves the "digram" structure of English and is known as the second-order approximation. As you can see from Shannon's examples below, the more sophisticated the process, the more the output appears closer to normal English.

1. Zero-order approximation (symbols independent and equiprobable).
XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD QPAAMKBZAACIBZLHJQD.

2. First-order approximation (symbols independent but with frequencies of English text).
OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA NAH BRL.

3. Second-order approximation (digram structure as in English).
ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE.

4. Third-order approximation (trigram structure as in English).
IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.

5. First-order word approximation. Rather than continue with n-gram structure it is easier and better to jump at this point to word units. Here words are chosen independently but with their appropriate frequencies.
REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TOOF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE.

6. Second-order word approximation. The word transition probabilities are correct but no further structure is included.
THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.