in

Information

Information

The theory of information is fundamental to a rational understanding of the temporal fabric of our world. The principal reason for this is that information and time cannot be separated reason- ably—a fact that will become evident from a short consideration of how we know about time and information. Only if information produces effects on objects whose changes in time are measurable physically is there a chance to test hypotheses about information objectively. Time, again, is measurable by a physical system only on condi­tion that this system has information about its internal changes. To measure time, information about change is necessary, and to measure infor­mation, change in time is necessary.

Motivated by concrete problems of empirical research and engineering, the theory of information is steadily developing into a coherent system of mathematical models describing the complex abstract structure of information. On the one hand, information is an entity difficult to grasp, because the interactions between its syntactic, semantic, and pragmatic components constitute a complex struc­ture. On the other hand, information is something supposed to exist almost everywhere: Its abstract features are claimed to be discernible in the most different kinds of systems such as quantum entan­glements, cells, humans, computers, and societies.

The need for a unified theory of information is, thus, felt at nearly all frontiers of science. Quantum physicists are discussing whether an interpretation of their experimental data in terms of information could resolve some of the paradoxes that haunt their understanding of the subatomic world. Cosmologists are calculating the information con­tent of black holes. Geneticists are describing the hereditary substance as containing information that is decoded in cells by means of the genetic code. Evolutionary biologists are assuming that the most important steps in the history of life on earth consisted in establishing increasingly efficient ways of information processing. Neurophysiologists are talking about the brain as the most complex infor­mation processor known to us. Computer scien­tists are constructing and programming machines that process information more and more intelli­gently without being continuously assisted by humans. Communication engineers are building information networks that connect human brains to computers in a new kind of symbiosis. Economists are proposing mathematical theories of economic behavior based on the distribution of information among people who buy and sell goods in a market. Sociologists are characterizing developed countries as information societies in a globalized world.

If we do not want to get lost in the jungle of information concepts used in different sciences, and if we want to make the relation between time and information clear, we are well advised to ori­ent ourselves with the help of a rough classifica­tion of the manifold aspects of information and to distinguish its syntactic, semantic, and pragmatic components. The scientific disciplines of syntax, semantics, and pragmatics are known to anyone who studies language or other sign systems. In short, syntax is the study of signs in their relation to other signs; semantics is the study of signs in relation to their conceptual and referential mean­ing; pragmatics is the study of signs in relation to the agents using them. That all three disciplines are also of utmost importance for an information theorist should not come as a surprise, because sign systems basically are means of transmitting information. Each sign is, as it were, a package of information: a syntactic unit that transports a meaning from a sender to a receiver.

This entry presents some fundamental charac­teristics of the syntactic components of informa­tion and reflects on the relation between syntactic and semantic features of information. The reason for restricting ourselves to these topics is that the general semantics and pragmatics of information still constitute a mostly uncharted continent. To explore it further, not only sophisticated mathe­matical models but also new concepts for analyz­ing the close relation between time and information will have to be developed.

Syntactic Features of Information in Shannon’s Communication Theory

Syntactic features of information are described by a variety of mathematical models. The most important one resulted from Claude E. Shannon’s understanding of communication as transmission of information. Shannon (1916-2001), research­ing on communication systems first as an engineer at the Bell Laboratories and then as a professor at the Massachusetts Institute of Technology, has rightly been called “the father of information theory.” The scheme of a general communication system he introduced in his classic paper, A Mathematical Theory of Communication (1948), still is the basis for most research in information theory.

According to Shannon, a general communica­tion system consists of five components: an infor­mation source, a transmitter, a channel, a receiver, and a destination. The information source gener­ates a message that is to be transmitted over the channel to the destination. Then the transmitter encodes the message in a signal that is suited for being transmitted via the channel. In the channel, there normally exists a certain probability that noise distorts the signal. The receiver decodes the transmitted and possibly distorted signal. Finally, the message is delivered to the destination. This general communication system can easily be exem­plified by telephony. A person (the information source) speaks into a telephone (the transmitter) that encodes sound waves in a sequence of analog or digital signals. These signals are transmitted via fiberglass cables, air, satellites, or other channels. The physical structure of the medium of transmis­sion, atmospherics, defective electronic devices, jamming stations, and other noise sources might distort the signal. A telephone at the other end of the channel (the receiver) decodes the transmitted signals in sound waves that some person (the des­tination) can hear.

Shannon introduced the general communica­tion system in order to solve two main problems of information transmission. First, how many sig­nals do we minimally need on average to encode a message of given length generated by an informa­tion source? Second, how fast can we reliably transmit an encoded message over a noisy chan­nel? Both problems ask for principal spatial and temporal limits of communication, or in other words, for minimum code lengths and maximum transmission rates. To answer these questions, Shannon defined an information-theoretical ana­logue to entropy, the statistical measure of disor­der in a thermodynamic system.

Omitting formal technicalities, Shannon’s cru­cial idea goes as follows. For any message that is generated by an information source, the informa­tion content of the message is equal to the amount of uncertainty that the destination of the message loses on its receipt. The less probable the receipt of a message is, the more information it carries to the destination. Shannon’s measure of the entropy of an information source, in short: Shannon entropy quantifies the average information con­tent of a message generated by an information source.

Shannon’s noiseless, or source, coding theorem shows that the entropy of an information source provides us with a lower boundary on the average length of signals that encode messages of the information source and are transmitted over a noiseless channel. If the length of these messages goes to infinity, the minimum expected signal length per message symbol goes to the entropy of the information source. Shannon entropy defines, thus, in terms of signal length, what a sender can optimally achieve in encoding messages.

In his noisy, or channel, coding theorem, Shannon proves the counterintuitive result that messages can always be sent with arbitrarily low, and even zero, error probability over a noisy channel—on condition that the rate (measured in information units per channel use) at which the message is transmitted does not exceed an upper limit specific to the particular channel. This upper limit is called channel capacity and can be quanti­fied by an ingenious use of Shannon entropy. If we subtract from the entropy of the information source the conditional entropy of the information source given the messages that are received by the destination, we get the mutual information of the information source and the destination. Mutual information measures how much the uncertainty about which message has been generated by the information source is reduced when we know the message the destination has received. If the chan­nel is noiseless, its capacity is equal to the Shannon entropy of the information source. The noisier the channel is, the more signals we must transmit additionally in order to correct the transmission errors. The maximum mutual information of an information source and a destination equals the capacity of the channel that connects both. That a sender who wants to transmit a message reliably tries to achieve channel capacity, regardless of how noisy the channel is, seems to be a vain endeavor, because any correction signal is sub­jected to distortion, too. Yet Shannon could show that for any noisy channel, there do exist codes by means of which a sender can transmit messages with arbitrarily small error at rates not above channel capacity—alas, he did not find a general procedure by which we could construct such codes, and up to now no information theorist has been able to perform this feat.

Shannon’s coding theorems prove, with mathe­matical exactness, principal physical limits of information transmission. His channel coding theorem shows that if we want to reliably transmit a message over a noisy channel—and any realistic channel is noisy—we must respect the channel capacity as an upper limit on the transmission rate of our message. If we want to be sure that another person receives our message in its original form, we must take the properties of the medium of transmission into account and make the encoding of the message as redundant as necessary. To make an encoding redundant means to make it longer than required by the noiseless coding theo­rem. It means that we need more time to transmit a message over a noisy channel than over a noise­less one. Shannon’s information theory implies, thus, an economics of information transmission: Given the goal of reliable information transmis­sion and knowing the noise in a channel, we must respect a spatial lower limit on the length of encodings and a temporal upper limit on their transmission rate.

Shannon entropy measures only a syntactic prop­erty of information, more precisely: a mathematical property of statistical distributions of messages. Its definition does not explicitly involve semantic or pragmatic aspects of information. Whether a trans­mitted message is completely nonsensical or very meaningful for its destination, Shannon entropy takes into account just the probability that a mes­sage is generated by an information source. Since Shannon published his mathematical theory of com­munication, further statistical measures of syntactic aspects of information have been defined. For example, the theory of identification entropy, devel­oped by the German mathematicians Rudolf Ahlswede and Gunter Dueck at the end of the 1980s, refers to Shannon’s general communication system yet introduces a decisive pragmatic differ­ence as regards the purpose of communication. In Ahlswede and Dueck’s scenario, the information source and the destination are not interested in the reliable transmission of all messages that the infor­mation source can generate. The destination just wants to be sure as fast as possible that one particu­lar message has been sent, which might have been encoded in different signals by the sender. It is the situation of someone who has bet money on a horse and only wants to know exactly whether this horse has won the race. Such a relaxation in the goal of communication allows an enormous increase in the speed of information transmission.

Semantic Features of Information in Shannon’s Communication Theory

The semantic and pragmatic features of informa­tion are much more difficult to formalize than its syntactic features. Some approaches to semantic aspects of information try, therefore, first to identify syntactic properties of signals that may be correlated to the fact that these signals have a meaning for both the information source and the destination. When we speak about meaning, sense, reference, and other semantic concepts in an information-theoretical context, we do not suppose that information sources and destinations have complex psychological qualities like those of human beings. Access to semantic aspects of infor­mation is, thus, not restricted to self-reflective agents who associate, with signs, mental represen­tations as designations and who refer consciously to objects in the real world as denotations. In this sense, “the semantic component of informa­tion” just means that at least a set of messages and a set of signals are interrelated by means of a convention.

A code, as a system of conventional rules that allow encoding and decoding, is the minimum semantic structure par excellence. It is normally not possible to infer which message is related to which signal, and vice versa, if we know only the elements of both sets (i.e., the messages and the signals), and the natural laws that constrain the encoding of messages in signals and the decoding of signals in messages. Thus, the most characteristic feature of the semantics of information is the conventional nature, or contingency, of the relation between messages and signals.

From this perspective, Shannon’s theory of communication as information transmission does say a lot about semantics implicitly, because it is also a theory of the encoding of messages in sig­nals and the decoding of signals in messages. Shannon’s coding theorems express information- theoretical limits on the syntax of signals if the latter semantically represent messages under prag­matic constraints on the compressibility of encod­ings and on the reliability of transmissions.

Let us now focus our discussion on the channel coding theorem and the measure of mutual infor­mation. The higher the channel capacity—that is, the higher the maximum mutual information of an information source and a destination—the more is known about the statistical properties of the infor­mation source given the destination, and vice versa. We can express, for each channel, the infor­mation transmission distance between a given information source and a given destination in terms of time minimally needed by the fastest receiver for interpreting transmitted signals cor­rectly as syntactic units that represent other syn­tactic units, namely messages. The noisier the channel between sender and receiver is, the less certain the semantic relation between a received signal and a transmitted message is for the receiver. Because the most general pragmatic function of communication is, for Shannon, the loss of uncer­tainty, the gain of uncertainty due to noisy chan­nels must be counteracted by the use of longer signals. Then the actual rate of information trans­mission over the channel decreases and the trans­mission time increases. The more effort has to go into making a signal a reliable representation of a message, the longer the receiver needs to infer the transmitted message from a received signal.

Conclusion

We started our investigation into the relation of time and information by observing two very gen­eral facts: To measure time, information about change is necessary; and to measure information, change in time is necessary. We ended up describ­ing an important example of the latter fact in semantics: The less certain the semantic content of a signal is, the more time is required to receive further signals needed for getting to know the message. Shannon’s theory of communication con­tains, thus, a quantitative insight into the context dependence of information: Interpreting signals is a process that must obey temporal constraints depending on the media used for information transmission. In-depth analysis of the semantic, and also the pragmatic, features of information will arguably need more insight into the interde­pendence of time and information.

Stefan Artmann

See also DNA; Entropy; Logical Depth; Maxwell’s

Demon; Quantum Mechanics

Further Readings

Arndt, C. (2001). Information measures: Information and its description in science and engineering. New York: Springer.

Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley.

Pierce, J. R. (1980). An introduction to information theory: Symbols, signals, and noise (2nd ed., rev.). New York: Dover.

Shannon, C. E., & Weaver, W. (1998). The mathematical theory of communication. Urbana: University of Illinois Press. (Original work published 1949)

Von Baeyer, H. C. (2003). Information: The new language of science. London: Weidenfeld & Nicolson.

What do you think?

Infinity

Infinity

Intuition

Intuition