The theory of information is fundamental to a rational understanding of the temporal fabric of our world. The principal reason for this is that information and time cannot be separated reason- ably—a fact that will become evident from a short consideration of how we know about time and information. Only if information produces effects on objects whose changes in time are measurable physically is there a chance to test hypotheses about information objectively. Time, again, is measurable by a physical system only on condition that this system has information about its internal changes. To measure time, information about change is necessary, and to measure information, change in time is necessary.
Motivated by concrete problems of empirical research and engineering, the theory of information is steadily developing into a coherent system of mathematical models describing the complex abstract structure of information. On the one hand, information is an entity difficult to grasp, because the interactions between its syntactic, semantic, and pragmatic components constitute a complex structure. On the other hand, information is something supposed to exist almost everywhere: Its abstract features are claimed to be discernible in the most different kinds of systems such as quantum entanglements, cells, humans, computers, and societies.
The need for a unified theory of information is, thus, felt at nearly all frontiers of science. Quantum physicists are discussing whether an interpretation of their experimental data in terms of information could resolve some of the paradoxes that haunt their understanding of the subatomic world. Cosmologists are calculating the information content of black holes. Geneticists are describing the hereditary substance as containing information that is decoded in cells by means of the genetic code. Evolutionary biologists are assuming that the most important steps in the history of life on earth consisted in establishing increasingly efficient ways of information processing. Neurophysiologists are talking about the brain as the most complex information processor known to us. Computer scientists are constructing and programming machines that process information more and more intelligently without being continuously assisted by humans. Communication engineers are building information networks that connect human brains to computers in a new kind of symbiosis. Economists are proposing mathematical theories of economic behavior based on the distribution of information among people who buy and sell goods in a market. Sociologists are characterizing developed countries as information societies in a globalized world.
If we do not want to get lost in the jungle of information concepts used in different sciences, and if we want to make the relation between time and information clear, we are well advised to orient ourselves with the help of a rough classification of the manifold aspects of information and to distinguish its syntactic, semantic, and pragmatic components. The scientific disciplines of syntax, semantics, and pragmatics are known to anyone who studies language or other sign systems. In short, syntax is the study of signs in their relation to other signs; semantics is the study of signs in relation to their conceptual and referential meaning; pragmatics is the study of signs in relation to the agents using them. That all three disciplines are also of utmost importance for an information theorist should not come as a surprise, because sign systems basically are means of transmitting information. Each sign is, as it were, a package of information: a syntactic unit that transports a meaning from a sender to a receiver.
This entry presents some fundamental characteristics of the syntactic components of information and reflects on the relation between syntactic and semantic features of information. The reason for restricting ourselves to these topics is that the general semantics and pragmatics of information still constitute a mostly uncharted continent. To explore it further, not only sophisticated mathematical models but also new concepts for analyzing the close relation between time and information will have to be developed.
Syntactic Features of Information in Shannon’s Communication Theory
Syntactic features of information are described by a variety of mathematical models. The most important one resulted from Claude E. Shannon’s understanding of communication as transmission of information. Shannon (1916-2001), researching on communication systems first as an engineer at the Bell Laboratories and then as a professor at the Massachusetts Institute of Technology, has rightly been called “the father of information theory.” The scheme of a general communication system he introduced in his classic paper, A Mathematical Theory of Communication (1948), still is the basis for most research in information theory.
According to Shannon, a general communication system consists of five components: an information source, a transmitter, a channel, a receiver, and a destination. The information source generates a message that is to be transmitted over the channel to the destination. Then the transmitter encodes the message in a signal that is suited for being transmitted via the channel. In the channel, there normally exists a certain probability that noise distorts the signal. The receiver decodes the transmitted and possibly distorted signal. Finally, the message is delivered to the destination. This general communication system can easily be exemplified by telephony. A person (the information source) speaks into a telephone (the transmitter) that encodes sound waves in a sequence of analog or digital signals. These signals are transmitted via fiberglass cables, air, satellites, or other channels. The physical structure of the medium of transmission, atmospherics, defective electronic devices, jamming stations, and other noise sources might distort the signal. A telephone at the other end of the channel (the receiver) decodes the transmitted signals in sound waves that some person (the destination) can hear.
Shannon introduced the general communication system in order to solve two main problems of information transmission. First, how many signals do we minimally need on average to encode a message of given length generated by an information source? Second, how fast can we reliably transmit an encoded message over a noisy channel? Both problems ask for principal spatial and temporal limits of communication, or in other words, for minimum code lengths and maximum transmission rates. To answer these questions, Shannon defined an information-theoretical analogue to entropy, the statistical measure of disorder in a thermodynamic system.
Omitting formal technicalities, Shannon’s crucial idea goes as follows. For any message that is generated by an information source, the information content of the message is equal to the amount of uncertainty that the destination of the message loses on its receipt. The less probable the receipt of a message is, the more information it carries to the destination. Shannon’s measure of the entropy of an information source, in short: Shannon entropy quantifies the average information content of a message generated by an information source.
Shannon’s noiseless, or source, coding theorem shows that the entropy of an information source provides us with a lower boundary on the average length of signals that encode messages of the information source and are transmitted over a noiseless channel. If the length of these messages goes to infinity, the minimum expected signal length per message symbol goes to the entropy of the information source. Shannon entropy defines, thus, in terms of signal length, what a sender can optimally achieve in encoding messages.
In his noisy, or channel, coding theorem, Shannon proves the counterintuitive result that messages can always be sent with arbitrarily low, and even zero, error probability over a noisy channel—on condition that the rate (measured in information units per channel use) at which the message is transmitted does not exceed an upper limit specific to the particular channel. This upper limit is called channel capacity and can be quantified by an ingenious use of Shannon entropy. If we subtract from the entropy of the information source the conditional entropy of the information source given the messages that are received by the destination, we get the mutual information of the information source and the destination. Mutual information measures how much the uncertainty about which message has been generated by the information source is reduced when we know the message the destination has received. If the channel is noiseless, its capacity is equal to the Shannon entropy of the information source. The noisier the channel is, the more signals we must transmit additionally in order to correct the transmission errors. The maximum mutual information of an information source and a destination equals the capacity of the channel that connects both. That a sender who wants to transmit a message reliably tries to achieve channel capacity, regardless of how noisy the channel is, seems to be a vain endeavor, because any correction signal is subjected to distortion, too. Yet Shannon could show that for any noisy channel, there do exist codes by means of which a sender can transmit messages with arbitrarily small error at rates not above channel capacity—alas, he did not find a general procedure by which we could construct such codes, and up to now no information theorist has been able to perform this feat.
Shannon’s coding theorems prove, with mathematical exactness, principal physical limits of information transmission. His channel coding theorem shows that if we want to reliably transmit a message over a noisy channel—and any realistic channel is noisy—we must respect the channel capacity as an upper limit on the transmission rate of our message. If we want to be sure that another person receives our message in its original form, we must take the properties of the medium of transmission into account and make the encoding of the message as redundant as necessary. To make an encoding redundant means to make it longer than required by the noiseless coding theorem. It means that we need more time to transmit a message over a noisy channel than over a noiseless one. Shannon’s information theory implies, thus, an economics of information transmission: Given the goal of reliable information transmission and knowing the noise in a channel, we must respect a spatial lower limit on the length of encodings and a temporal upper limit on their transmission rate.
Shannon entropy measures only a syntactic property of information, more precisely: a mathematical property of statistical distributions of messages. Its definition does not explicitly involve semantic or pragmatic aspects of information. Whether a transmitted message is completely nonsensical or very meaningful for its destination, Shannon entropy takes into account just the probability that a message is generated by an information source. Since Shannon published his mathematical theory of communication, further statistical measures of syntactic aspects of information have been defined. For example, the theory of identification entropy, developed by the German mathematicians Rudolf Ahlswede and Gunter Dueck at the end of the 1980s, refers to Shannon’s general communication system yet introduces a decisive pragmatic difference as regards the purpose of communication. In Ahlswede and Dueck’s scenario, the information source and the destination are not interested in the reliable transmission of all messages that the information source can generate. The destination just wants to be sure as fast as possible that one particular message has been sent, which might have been encoded in different signals by the sender. It is the situation of someone who has bet money on a horse and only wants to know exactly whether this horse has won the race. Such a relaxation in the goal of communication allows an enormous increase in the speed of information transmission.
Semantic Features of Information in Shannon’s Communication Theory
The semantic and pragmatic features of information are much more difficult to formalize than its syntactic features. Some approaches to semantic aspects of information try, therefore, first to identify syntactic properties of signals that may be correlated to the fact that these signals have a meaning for both the information source and the destination. When we speak about meaning, sense, reference, and other semantic concepts in an information-theoretical context, we do not suppose that information sources and destinations have complex psychological qualities like those of human beings. Access to semantic aspects of information is, thus, not restricted to self-reflective agents who associate, with signs, mental representations as designations and who refer consciously to objects in the real world as denotations. In this sense, “the semantic component of information” just means that at least a set of messages and a set of signals are interrelated by means of a convention.
A code, as a system of conventional rules that allow encoding and decoding, is the minimum semantic structure par excellence. It is normally not possible to infer which message is related to which signal, and vice versa, if we know only the elements of both sets (i.e., the messages and the signals), and the natural laws that constrain the encoding of messages in signals and the decoding of signals in messages. Thus, the most characteristic feature of the semantics of information is the conventional nature, or contingency, of the relation between messages and signals.
From this perspective, Shannon’s theory of communication as information transmission does say a lot about semantics implicitly, because it is also a theory of the encoding of messages in signals and the decoding of signals in messages. Shannon’s coding theorems express information- theoretical limits on the syntax of signals if the latter semantically represent messages under pragmatic constraints on the compressibility of encodings and on the reliability of transmissions.
Let us now focus our discussion on the channel coding theorem and the measure of mutual information. The higher the channel capacity—that is, the higher the maximum mutual information of an information source and a destination—the more is known about the statistical properties of the information source given the destination, and vice versa. We can express, for each channel, the information transmission distance between a given information source and a given destination in terms of time minimally needed by the fastest receiver for interpreting transmitted signals correctly as syntactic units that represent other syntactic units, namely messages. The noisier the channel between sender and receiver is, the less certain the semantic relation between a received signal and a transmitted message is for the receiver. Because the most general pragmatic function of communication is, for Shannon, the loss of uncertainty, the gain of uncertainty due to noisy channels must be counteracted by the use of longer signals. Then the actual rate of information transmission over the channel decreases and the transmission time increases. The more effort has to go into making a signal a reliable representation of a message, the longer the receiver needs to infer the transmitted message from a received signal.
We started our investigation into the relation of time and information by observing two very general facts: To measure time, information about change is necessary; and to measure information, change in time is necessary. We ended up describing an important example of the latter fact in semantics: The less certain the semantic content of a signal is, the more time is required to receive further signals needed for getting to know the message. Shannon’s theory of communication contains, thus, a quantitative insight into the context dependence of information: Interpreting signals is a process that must obey temporal constraints depending on the media used for information transmission. In-depth analysis of the semantic, and also the pragmatic, features of information will arguably need more insight into the interdependence of time and information.
See also DNA; Entropy; Logical Depth; Maxwell’s
Demon; Quantum Mechanics
Arndt, C. (2001). Information measures: Information and its description in science and engineering. New York: Springer.
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley.
Pierce, J. R. (1980). An introduction to information theory: Symbols, signals, and noise (2nd ed., rev.). New York: Dover.
Shannon, C. E., & Weaver, W. (1998). The mathematical theory of communication. Urbana: University of Illinois Press. (Original work published 1949)
Von Baeyer, H. C. (2003). Information: The new language of science. London: Weidenfeld & Nicolson.