Characterizing Messages

Main Menu Previous Topic Next Topic

**Information Sources**

In the previous examples, we took the inputs to the channel as given. We must now expand our view to include the source of information. Once we formalize an information source, we will consider the question of how to match an information source to an information channel.

An information source can take many forms. It can be a human (speaking
or maybe typing), a stock market, a probabilistic quantum effect, and so
on. Shannon's results, however, only hold for a certain subset of information
sources. The source must choose symbols one at a time according to fixed
probabilities, depending on preceding choices and current symbols (We could
envision a source to be a probabilistic finite-state automaton). We further require that
all possible sequences of symbols produced by the source be statistically
indistinguishable from each other with probability 1. This assures us that any
given sequence is typical of all sequences. We say that a source satisfying this last property
is *ergodic*.

At first glance, this restriction appears limiting. For instance, a human speaking English may not be an ergodic source of information. However, Shannon suggests that most real-life sources of information can be approximated by made-up ergodic finite-state automata. This makes Shannon's theorems useful for modeling real world communication

Shannon gives an example of how to construct an ergodic finite-state automaton
that approximates English speech or writing:

First, we could simply output random letters with equal probability.
This is the so-called zero-order approximation. When reading zero-order output, the
unnatural abundance of X's and other letters immediately jumps out. This
suggests that a better approximation might be made by considering the
relative frequency of letters. This leads to the first-order
approximation. Here, letters are chosen independently, but with
probability equal to their relative frequency in English. A still
better method is to consider the probability of each letter given the
previous letter. This preserves the "digram" structure of English and
is known as the second-order approximation. As you can see from Shannon's
examples below, the more sophisticated the process, the more the
output appears closer to normal English.

```
``````
```
1. Zero-order approximation (symbols independent and equiprobable).

XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD QPAAMKBZAACIBZLHJQD.

2. First-order approximation (symbols independent but with frequencies of English text).

OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA
NAH BRL.

3. Second-order approximation (digram structure as in English).

ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE
AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE.

4. Third-order approximation (trigram structure as in English).

IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES
OF THE REPTAGIN IS REGOACTIONA OF CRE.

5. First-order word approximation. Rather than continue with n-gram structure it is easier
and better to jump at this point to word units. Here words are chosen independently but with their
appropriate frequencies.

REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL
HERE HE THE A IN CAME THE TOOF TO EXPERT GRAY COME TO FURNISHES
THE LINE MESSAGE HAD BE THESE.

6. Second-order word approximation. The word transition probabilities are correct but no further structure
is included.

THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER
OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT
THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.