Difference between revisions of "Information Theory/Discrete Memoryless Sources"

Revision as of 00:48, 28 October 2020

# OVERVIEW OF THE FIRST MAIN CHAPTER #

This first chapter describes the calculation and the meaning of entropy. According to the Shannonian information definition, entropy is a measure of the mean uncertainty about the outcome of a statistical event or the uncertainty in the measurement of a stochastic quantity. Somewhat casually expressed, the entropy of a random quantity quantifies its "randomness".

In detail are discussed:

the decision content and the entropy of a memoryless news source,
the binary entropy function and its application to non-binary sources,
the entropy calculation for memory sources and suitable approximations,
the peculiarities of Markov sources regarding the entropy calculation,
the procedure for sources with a large number of symbols, for example natural texts,
the entropy estimates according to Shannon and Küpfmüller.

Further information on the topic as well as Exercises, simulations and programming exercises can be found in the experiment "Value Discrete Information Theory" of the practical course "Simulation Digitaler Übertragungssysteme" (english: Simulation of Digital Transmission Systems). This (former) LNT course at the TU Munich is based on

the Windows program WDIT ⇒ the link points to the ZIP version of the program and
the associated Internship guide ⇒ the link refers to the PDF version.

Model and requirements

We consider a value discrete message source $\rm Q$, which gives a sequence $ \langle q_ν \rangle$ of symbols.

For the run variable $ν = 1$, ... , $N$, where $N$ should be "sufficiently large".
Each individual source symbol $q_ν$ comes from a symbol set $\{q_μ \}$ where $μ = 1$, ... , $M$, where $M$ denotes the symbol range:

$$q_{\nu} \in \left \{ q_{\mu} \right \}, \hspace{0.25cm}{\rm with}\hspace{0.25cm} \nu = 1, \hspace{0.05cm} \text{ ...}\hspace{0.05cm} , N\hspace{0.25cm}{\rm and}\hspace{0.25cm}\mu = 1,\hspace{0.05cm} \text{ ...}\hspace{0.05cm} , M \hspace{0.05cm}.$$

The figure shows a quaternary message source $(M = 4)$ with the alphabet $\rm \{A, \ B, \ C, \ D\}$ and an exemplary sequence of length $N = 100$.

Memoryless Quaternary Message Source

The following requirements apply:

The quaternary news source is fully described by $M = 4$ symbol probabilities $p_μ$. In general it applies:

$$\sum_{\mu = 1}^M \hspace{0.1cm}p_{\mu} = 1 \hspace{0.05cm}.$$

The message source is memoryless, i.e., the individual sequence elements are statistically independent of each other:

$${\rm Pr} \left (q_{\nu} = q_{\mu} \right ) = {\rm Pr} \left (q_{\nu} = q_{\mu} \hspace{0.03cm} | \hspace{0.03cm} q_{\nu -1}, q_{\nu -2}, \hspace{0.05cm} \text{ ...}\hspace{0.05cm}\right ) \hspace{0.05cm}.$$

Since the alphabet consists of symbols (and not of random variables) , the specification of expected values (linear mean, quadratic mean, dispersion, etc.) is not possible here, but also not necessary from an information-theoretical point of view.

These properties will now be illustrated with an example.

Relative frequencies as a function of $N$

$\text{Example 1:}$ For the symbol probabilities of a quaternary source applies:

$$p_{\rm A} = 0.4 \hspace{0.05cm},\hspace{0.2cm}p_{\rm B} = 0.3 \hspace{0.05cm},\hspace{0.2cm}p_{\rm C} = 0.2 \hspace{0.05cm},\hspace{0.2cm} p_{\rm D} = 0.1\hspace{0.05cm}.$$

For an infinitely long sequence $(N \to \infty)$

the relative frequencies $h_{\rm A}$, $h_{\rm B}$, $h_{\rm C}$, $h_{\rm D}$ ⇒ a-posteriori parameters
were identical to the probabilities $p_{\rm A}$, $p_{\rm B}$, $p_{\rm C}$, $p_{\rm D}$ ⇒ a-priori parameters.

With smaller $N$ deviations may occur, as the adjacent table (result of a simulation) shows.

In the graphic above an exemplary sequence is shown with $N = 100$ symbols.
Due to the set elements $\rm A$, $\rm B$, $\rm C$ and $\rm D$ no mean values can be given.

However, if you replace the symbols with numerical values, for example $\rm A \Rightarrow 1$, $\rm B \Rightarrow 2$, $\rm C \Rightarrow 3$, $\rm D \Rightarrow 4$, then you will get
time averaging ⇒ crossing line or ensemble averaging ⇒ expected value formation

for the linear average :

$$m_1 = \overline { q_{\nu} } = {\rm E} \big [ q_{\mu} \big ] = 0.4 \cdot 1 + 0.3 \cdot 2 + 0.2 \cdot 3 + 0.1 \cdot 4 = 2 \hspace{0.05cm},$$

for the square mean:

$$m_2 = \overline { q_{\nu}^{\hspace{0.05cm}2} } = {\rm E} \big [ q_{\mu}^{\hspace{0.05cm}2} \big ] = 0.4 \cdot 1^2 + 0.3 \cdot 2^2 + 0.2 \cdot 3^2 + 0.1 \cdot 4^2 = 5 \hspace{0.05cm},$$

for the standard deviation (scattering) according to the "Theorem of Steiner":

$$\sigma = \sqrt {m_2 - m_1^2} = \sqrt {5 - 2^2} = 1 \hspace{0.05cm}.$$

Decision content - Message content

Claude Elwood Shannon defined in 1948 in the standard work of information theory [Sha48]^[1] the concept of information as "decrease of uncertainty about the occurrence of a statistical event".

Let us make a mental experiment with $M$ possible results, which are all equally probable: $p_1 = p_2 = \hspace{0.05cm} \text{ ...}\hspace{0.05cm} = p_M = 1/M \hspace{0.05cm}.$

Under this assumption applies:

Is $M = 1$, then each individual attempt will yield the same result and therefore there is no uncertainty about the output.
On the other hand, an observer learns about an experiment with $M = 2$, for example the "coin toss" with the set of events $\big \{\rm \boldsymbol{\rm Z}, \rm \boldsymbol{\rm W} \big \}$ and the probabilities $p_{\rm Z} = p_{\rm W} = 0. 5$, a gain in information; The uncertainty regarding $\rm Z$ resp. $\rm W$ is resolved.
In the experiment "dice" $(M = 6)$ and even more in roulette $(M = 37)$ the gained information is even more significant for the observer than in the "coin toss" when he learns which number was thrown or which ball fell.
Finally it should be considered that the experiment "triple coin toss" with the $M = 8$ possible results $\rm ZZZ$, $\rm ZZW$, $\rm ZWZ$, $\rm ZWW$, $\rm WZZ$, $\rm WZW$, $\rm WWZ$, $\rm WWW$ provides three times the information as the single coin toss $(M = 2)$.

The following definition fulfills all the requirements listed here for a quantitative information measure for equally probable events, indicated only by the symbol range $M$.

$\text{Definition:}$ The decision content of a message source depends only on the symbol range $M$ and results in

$$H_0 = {\rm log}\hspace{0.1cm}M = {\rm log}_2\hspace{0.1cm}M \hspace{0.15cm} {\rm (in \ “bit")} = {\rm ln}\hspace{0.1cm}M \hspace{0.15cm}\text {(in “nat")} = {\rm lg}\hspace{0.1cm}M \hspace{0.15cm}\text {(in “Hartley")}\hspace{0.05cm}.$$

The term message content is also commonly used for this.
Since $H_0$ indicates the maximum value of the Entropy $H$ , $H_\text{max}$ is also used in our tutorial as short notation .

Please note our nomenclature:

The logarithm will be called "log" in the following, independent of the base.
The relations mentioned above are fulfilled due to the following properties:

$${\rm log}\hspace{0.1cm}1 = 0 \hspace{0.05cm},\hspace{0.2cm} {\rm log}\hspace{0.1cm}37 > {\rm log}\hspace{0.1cm}6 > {\rm log}\hspace{0.1cm}2\hspace{0.05cm},\hspace{0.2cm} {\rm log}\hspace{0.1cm}M^k = k \cdot {\rm log}\hspace{0.1cm}M \hspace{0.05cm}.$$

Usually we use the logarithm to the base $2$ ⇒ Logarithm dualis $\rm (ld)$, where the pseudo unit "bit", more precisely: "bit/symbol", is then added:

$${\rm ld}\hspace{0.1cm}M = {\rm log_2}\hspace{0.1cm}M = \frac{{\rm lg}\hspace{0.1cm}M}{{\rm lg}\hspace{0.1cm}2} = \frac{{\rm ln}\hspace{0.1cm}M}{{\rm ln}\hspace{0.1cm}2} \hspace{0.05cm}.$$

In addition, you can find in the literature some additional definitions, which are based on the natural logarithm $\rm (ln)$ or the logarithm $\rm (lg)$ .

Information content and entropy

We now waive the previous requirement that all $M$ possible results of an experiment are equally probable. In order to keep the spelling as compact as possible, we define for this page only:

$$p_1 > p_2 > \hspace{0.05cm} \text{ ...}\hspace{0.05cm} > p_\mu > \hspace{0.05cm} \text{ ...}\hspace{0.05cm} > p_{M-1} > p_M\hspace{0.05cm},\hspace{0.4cm}\sum_{\mu = 1}^M p_{\mu} = 1 \hspace{0.05cm}.$$

We now consider the information content of the individual symbols, where we denote the "logarithm dualis" with $\log_2$:

$$I_\mu = {\rm log_2}\hspace{0.1cm}\frac{1}{p_\mu}= -\hspace{0.05cm}{\rm log_2}\hspace{0.1cm}{p_\mu} \hspace{0.5cm}{\rm (Einheit\hspace{-0.15cm}: \hspace{0.15cm}bit\hspace{0.15cm}oder\hspace{0.15cm}bit/Symbol)} \hspace{0.05cm}.$$

You can see:

because of $p_μ ≤ 1$ the information content is never negative. In the borderline case $p_μ \to 1$ goes $I_μ \to 0$.
However for $I_μ = 0$ ⇒ $p_μ = 1$ ⇒ $M = 1$ the decision content is also $H_0 = 0$.
For decreasing probabilities $p_μ$ the information content increases continuously:

$$I_1 < I_2 < \hspace{0.05cm} \text{ ...}\hspace{0.05cm} < I_\mu <\hspace{0.05cm} \text{ ...}\hspace{0.05cm} < I_{M-1} < I_M \hspace{0.05cm}.$$

$\text{Conclusion:}$ The more improbable an event is, the greater is its information content. This fact is also found in daily life:

"6 right ones" in the lottery are more likely to be noticed than "3 right ones" or no win at all.
A tsunami in Asia also dominates the news in Germany for weeks as opposed to the almost standard Deutsche Bahn delays.
A series of defeats of Bayern Munich leads to huge headlines in contrast to a winning series. With 1860 Munich exactly the opposite is the case.

However, the information content of a single symbol (or event) is not very interesting. On the other hand

by ensemble averaging over all possible symbols $q_μ$ bzw.
by time averaging over all elements of the sequence $\langle q_ν \rangle$

one of the central variables of information theory.

$\text{Definition:}$ The Entropy $H$ of a source indicates the mean information content of all symbols :

$$H = \overline{I_\nu} = {\rm E}\hspace{0.01cm}[I_\mu] = \sum_{\mu = 1}^M p_{\mu} \cdot {\rm log_2}\hspace{0.1cm}\frac{1}{p_\mu}= -\sum_{\mu = 1}^M p_{\mu} \cdot{\rm log_2}\hspace{0.1cm}{p_\mu} \hspace{0.5cm}\text{(unit: bit, more precisely: bit/symbol)} \hspace{0.05cm}.$$

The overline marks again a time averaging and $\rm E[\text{...}]$ a ensemble averaging.

Entropy is among other things a measure for

the mean uncertainty about the outcome of a statistical event,
the "randomness" of this event, and
the average information content of a random variable.

Binary entropy function

At first we will restrict ourselves to the special case $M = 2$ and consider a binary source, which returns the two symbols $\rm A$ and $\rm B$ The occurrence probabilities are $p_{\rm A} = p$ and $p_{\rm B} = 1 - p$.

For the entropy of this binary source applies:

$$H_{\rm bin} (p) = p \cdot {\rm log_2}\hspace{0.1cm}\frac{1}{\hspace{0.1cm}p\hspace{0.1cm}} + (1-p) \cdot {\rm log_2}\hspace{0.1cm}\frac{1}{1-p} \hspace{0.5cm}{\rm (Einheit\hspace{-0.15cm}: \hspace{0.15cm}bit\hspace{0.15cm}oder\hspace{0.15cm}bit/Symbol)} \hspace{0.05cm}.$$

The function is called $H_\text{bin}(p)$ the binary entropy function. The entropy of a source with a larger symbol range $M$ can often be expressed using $H_\text{bin}(p)$ .

$\text{Example 2:}$ The figure shows the binary entropy function for the values $0 ≤ p ≤ 1$ of the symbol probability of $\rm A$ $($or also of $\rm B)$. You can see

Binary entropy function as function of $p$

The maximum value $H_\text{max} = 1\; \rm bit$ results for $p = 0.5$, thus for equally probable binary symbols. Then $\rm A$ and $\rm B$ contribute the same amount to entropy.
$H_\text{bin}(p)$ is symmetrical about $p = 0.5$. A source with $p_{\rm A} = 0.1$ and $p_{\rm B} = 0. 9$ has the same entropy $H = 0.469 \; \rm bit$ as a source with $p_{\rm A} = 0.9$ and $p_{\rm B} = 0.1$.
The difference $ΔH = H_\text{max} - H$ gives the redundancy of the source and $r = ΔH/H_\text{max}$ the relative redundancy. In the example, $ΔH = 0.531\; \rm bit$ and $r = 53.1 \rm \%$.
For $p = 0$ this results in $H = 0$, since the symbol sequence $\rm B \ B \ B \text{...}$ can be predicted with certainty. Actually, the symbol range is now only $M = 1$. The same applies to $p = 1$ ⇒ symbol sequence $\rm A \ A \ A \ text{...}$.
$H_\text{bin}(p)$ is always a concave function, since the second derivative after the parameter $p$ is negative for all values of $p$ :

$$\frac{ {\rm d}^2H_{\rm bin} (p)}{ {\rm d}\,p^2} = \frac{- 1}{ {\rm ln}(2) \cdot p \cdot (1-p)}< 0 \hspace{0.05cm}.$$

Message sources with a larger symbol range

In the first section of this chapter we have a quaternary message source $(M = 4)$ with the symbol probabilities $p_{\rm A} = 0. 4$, $p_{\rm B} = 0.3$, $p_{\rm C} = 0.2$ and $ p_{\rm D} = 0.1$ considered. This source has the following entropy:

$$H_{\rm quat} = 0.4 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.4} + 0.3 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0. 3} + 0.2 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.2}+ 0.1 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.1}.$$

For numerical calculation, the detour via the decimal logarithm $\lg \ x = {\rm log}_{10} \ x$ , is often necessary. Since the logarithm dualis $ {\rm log}_2 \ x$ is mostly not found on pocket calculators.

$$H_{\rm quat}=\frac{1}{{\rm lg}\hspace{0.1cm}2} \cdot \left [ 0.4 \cdot {\rm lg}\hspace{0.1cm}\frac{1}{0.4} + 0.3 \cdot {\rm lg}\hspace{0.1cm}\frac{1}{0. 3} + 0.2 \cdot {\rm lg}\hspace{0.1cm}\frac{1}{0.2} + 0.1 \cdot {\rm lg}\hspace{0.1cm}\frac{1}{0.1} \right ] = 1.845\,{\rm bit} \hspace{0.05cm}.$$

{

$\text{Example 3:}$ Now there are certain symmetries between the symbol probabilities:

Entropy of binary source and quaternary source

$$p_{\rm A} = p_{\rm D} = p \hspace{0.05cm},\hspace{0.4cm}p_{\rm B} = p_{\rm C} = 0.5 - p \hspace{0.05cm},\hspace{0.3cm}{\rm with} \hspace{0.15cm}0 \le p \le 0.5 \hspace{0.05cm}.$$

In this case, the binary entropy function can be used to calculate the entropy:

$$H_{\rm quat} = 2 \cdot p \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{\hspace{0.1cm}p\hspace{0.1cm} } + 2 \cdot (0.5-p) \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.5-p}$$

$$\Rightarrow \hspace{0.3cm} H_{\rm quat} = 1 + H_{\rm bin}(2p) \hspace{0.05cm}.$$

The graphic shows as a function of $p$

the entropy of the quaternary source (blue)
in comparison to the entropy course of the binary source (red).

For the quaternary source only the abscissa $0 ≤ p ≤ 0.5$ is allowed.
You can see from the blue curve for the quaternary source:

The maximum entropy $H_\text{max} = 2 \; \rm bit/symbol$ results for $p = 0.25$ ⇒ equally probable symbols: $p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm A} = 0.25$.
With $p = 0$ resp. $p = 0.5$ the quaternary source degenerates to a binary source with $p_{\rm B} = p_{\rm C} = 0. 5$ and $p_{\rm A} = p_{\rm D} = 0$ ⇒ entropy $H = 1 \; \rm bit/symbol$.
The source with $p_{\rm A} = p_{\rm D} = 0.1$ and $p_{\rm B} = p_{\rm C} = 0.4$ has the following characteristics (each with the pseudo unit "bit/symbol"):

(1) entropy: $H = 1 + H_{\rm bin} (2p) =1 + H_{\rm bin} (0.2) = 1.722,$

(2) Redundancy: ${\rm \Delta }H = {\rm log_2}\hspace{0.1cm} M - H =2- 1.722= 0.278,$

(3) relative redundancy: $r ={\rm \delta }H/({\rm log_2}\hspace{0.1cm} M) = 0.139\hspace{0.05cm}.$

The redundancy of the quaternary source with $p = 0.1$ is equal to $ΔH = 0.278 \; \rm bit/symbol$ and thus exactly the same as the redundancy of the binary source with $p = 0.2$.

Exercises for chapter

Aufgabe 1.1: Wetterentropie

Aufgabe 1.1Z: Binäre Entropiefunktion

Aufgabe 1.2: Entropie von Ternärquellen

Quellenverzeichnis

↑ Shannon, C.E.: A Mathematical Theory of Communication. In: Bell Syst. Techn. J. 27 (1948), pp. 379-423 and pp. 623-656.

[Sha48-1] Shannon, C.E.: A Mathematical Theory of Communication. In: Bell Syst. Techn. J. 27 (1948), pp. 379-423 and pp. 623-656.

[1]

@@ Line 120: / Line 120: @@
 *In addition, you can find in the literature some additional definitions, which are based on the natural logarithm&nbsp; $\rm (ln)$&nbsp; or the logarithm&nbsp; $\rm (lg)$&nbsp;.
-==Informationsgehalt und Entropie ==
+==Information content and entropy ==
 <br>
-Wir verzichten nun auf die bisherige Voraussetzung, dass alle&nbsp; $M$&nbsp; möglichen Ergebnisse eines Versuchs gleichwahrscheinlich seien.&nbsp; Im Hinblick auf eine möglichst kompakte Schreibweise legen wir für diese Seite lediglich fest:
+We now waive the previous requirement that all&nbsp; $M$&nbsp; possible results of an experiment are equally probable.&nbsp; In order to keep the spelling as compact as possible, we define for this page only:
-:$$p_1 > p_2 > \hspace{0.05cm} \text{ ...}\hspace{0.05cm} > p_\mu > \hspace{0.05cm} \text{ ...}\hspace{0.05cm}  > p_{M-1} > p_M\hspace{0.05cm},\hspace{0.4cm}\sum_{\mu = 1}^M p_{\mu}  = 1 \hspace{0.05cm}.$$
+:$$p_1 > p_2 > \hspace{0.05cm} \text{ ...}\hspace{0.05cm} > p_\mu > \hspace{0.05cm} \text{ ...}\hspace{0.05cm} > p_{M-1} > p_M\hspace{0.05cm},\hspace{0.4cm}\sum_{\mu = 1}^M p_{\mu} = 1 \hspace{0.05cm}.$$
-Wir betrachten nun den ''Informationsgehalt''&nbsp; der einzelnen Symbole, wobei wir den &bdquo;Logarithmus dualis&rdquo; mit $\log_2$ bezeichnen:
+We now consider the ''information content''&nbsp; of the individual symbols, where we denote the "logarithm dualis" with $\log_2$:
 :$$I_\mu = {\rm log_2}\hspace{0.1cm}\frac{1}{p_\mu}= -\hspace{0.05cm}{\rm log_2}\hspace{0.1cm}{p_\mu}
@@ Line 132: / Line 132: @@
 \hspace{0.05cm}.$$
-Man erkennt:
+You can see:
-*Wegen&nbsp; $p_μ ≤ 1$&nbsp; ist der Informationsgehalt nie negativ.&nbsp; Im Grenzfall&nbsp; $p_μ  \to  1$&nbsp; geht&nbsp; $I_μ  \to  0$.
+*because of&nbsp; $p_μ ≤ 1$&nbsp; the information content is never negative.&nbsp; In the borderline case&nbsp; $p_μ \to 1$&nbsp; goes&nbsp; $I_μ \to 0$.
-*Allerdings ist für&nbsp; $I_μ = 0$  &nbsp; &rArr; &nbsp;  $p_μ = 1$  &nbsp; &rArr; &nbsp;  $M = 1$&nbsp; auch der Entscheidungsgehalt&nbsp; $H_0 = 0$.
+*However for&nbsp; $I_μ = 0$ &nbsp; &rArr; &nbsp; $p_μ = 1$ &nbsp; &rArr; &nbsp; $M = 1$&nbsp; the decision content is also&nbsp; $H_0 = 0$.
-*Bei abfallenden Wahrscheinlichkeiten&nbsp; $p_μ$&nbsp; nimmt der Informationsgehalt kontinuierlich zu:
+*For decreasing probabilities&nbsp; $p_μ$&nbsp; the information content increases continuously:
 :$$I_1 < I_2 < \hspace{0.05cm} \text{ ...}\hspace{0.05cm} < I_\mu <\hspace{0.05cm} \text{ ...}\hspace{0.05cm} < I_{M-1} < I_M \hspace{0.05cm}.$$
 {{BlaueBox|TEXT=
-$\text{Fazit:}$&nbsp; '''Je unwahrscheinlicher ein Ereignis ist, desto größer ist sein Informationsgehalt'''.&nbsp; Diesen Sachverhalt stellt man auch im täglichen Leben fest:
+$\text{Conclusion:}$&nbsp; '''The more improbable an event is, the greater is its information content'''.&nbsp; This fact is also found in daily life:
-*„6 Richtige” im Lotto nimmt man sicher eher wahr als „3 Richtige” oder gar keinen Gewinn.
+*"6 right ones" in the lottery are more likely to be noticed than "3 right ones" or no win at all.
-*Ein Tsunami in Asien dominiert auch die Nachrichten in Deutschland über Wochen im Gegensatz zu den fast standardmäßigen Verspätungen der Deutschen Bahn.
+*A tsunami in Asia also dominates the news in Germany for weeks as opposed to the almost standard Deutsche Bahn delays.
-*Eine Niederlagenserie von Bayern München führt zu Riesen–Schlagzeilen im Gegensatz zu einer Siegesserie.&nbsp; Bei 1860 München ist genau das Gegenteil der Fall.}}
+*A series of defeats of Bayern Munich leads to huge headlines in contrast to a winning series.&nbsp; With 1860 Munich exactly the opposite is the case.}}
-Der Informationsgehalt eines einzelnen Symbols (oder Ereignisses) ist allerdings nicht sehr interessant.&nbsp; Dagegen erhält man
+However, the information content of a single symbol (or event) is not very interesting.&nbsp; On the other hand
-*durch Scharmittelung über alle möglichen Symbole&nbsp; $q_μ$ &nbsp;bzw.&nbsp;
+*by ensemble averaging over all possible symbols&nbsp; $q_μ$ &nbsp;bzw.&nbsp;
-*durch Zeitmittelung über alle Elemente der Folge&nbsp; $\langle q_ν \rangle$
+*by time averaging over all elements of the sequence&nbsp; $\langle q_ν \rangle$
+one of the central variables of information theory.
-eine der zentralen Größen der Informationstheorie.
 {{BlaueBox|TEXT=
-$\text{Definition:}$&nbsp;  Die&nbsp; '''Entropie'''&nbsp; $H$&nbsp; einer Quelle gibt den ''mittleren Informationsgehalt aller Symbole''&nbsp; an:
+$\text{Definition:}$&nbsp; The&nbsp; '''Entropy'''&nbsp; $H$&nbsp; of a source indicates the ''mean information content of all symbols''&nbsp;:
 :$$H = \overline{I_\nu} = {\rm E}\hspace{0.01cm}[I_\mu] = \sum_{\mu = 1}^M p_{\mu} \cdot {\rm log_2}\hspace{0.1cm}\frac{1}{p_\mu}=
-  -\sum_{\mu = 1}^M p_{\mu} \cdot{\rm log_2}\hspace{0.1cm}{p_\mu} \hspace{0.5cm}\text{(Einheit:   bit, genauer:   bit/Symbol)}
+  -\sum_{\mu = 1}^M p_{\mu} \cdot{\rm log_2}\hspace{0.1cm}{p_\mu} \hspace{0.5cm}\text{(unit: bit, more precisely: bit/symbol)}
 \hspace{0.05cm}.$$
-Die überstreichende Linie kennzeichnet wieder eine Zeitmittelung und&nbsp; $\rm E[\text{...}]$&nbsp; eine Scharmittelung.}}
+The overline marks again a time averaging and&nbsp; $\rm E[\text{...}]$&nbsp; a ensemble averaging.}}
-Die Entropie ist unter anderem ein Maß für
+Entropy is among other things a measure for
-*die mittlere Unsicherheit über den Ausgang eines statistischen Ereignisses,
+*the mean uncertainty about the outcome of a statistical event,
-*die „Zufälligkeit” dieses Ereignisses,&nbsp; sowie
+*the "randomness" of this event,&nbsp; and
-*den mittleren Informationsgehalt einer Zufallsgröße.
+*the average information content of a random variable.
-==Binäre Entropiefunktion  ==
+==Binary entropy function ==
 <br>
-Wir beschränken uns zunächst auf den Sonderfall&nbsp; $M = 2$&nbsp; und betrachten eine binäre Quelle, die die beiden Symbole&nbsp; $\rm A$&nbsp; und&nbsp; $\rm B$&nbsp; abgibt.&nbsp; Die Auftrittwahrscheinlichkeiten seien &nbsp; $p_{\rm A} = p$&nbsp; und&nbsp; $p_{\rm B} = 1 – p$.
+At first we will restrict ourselves to the special case&nbsp; $M = 2$&nbsp; and consider a binary source, which returns the two symbols&nbsp; $\rm A$&nbsp; and&nbsp; $\rm B$&nbsp; &nbsp; The occurrence probabilities are &nbsp; $p_{\rm A} = p$&nbsp; and&nbsp; $p_{\rm B} = 1 - p$.
-Für die Entropie dieser Binärquelle gilt:
+For the entropy of this binary source applies:
-:$$H_{\rm bin} (p) =  p \cdot {\rm log_2}\hspace{0.1cm}\frac{1}{\hspace{0.1cm}p\hspace{0.1cm}} + (1-p) \cdot {\rm log_2}\hspace{0.1cm}\frac{1}{1-p} \hspace{0.5cm}{\rm (Einheit\hspace{-0.15cm}: \hspace{0.15cm}bit\hspace{0.15cm}oder\hspace{0.15cm}bit/Symbol)}
+:$$H_{\rm bin} (p) = p \cdot {\rm log_2}\hspace{0.1cm}\frac{1}{\hspace{0.1cm}p\hspace{0.1cm}} + (1-p) \cdot {\rm log_2}\hspace{0.1cm}\frac{1}{1-p} \hspace{0.5cm}{\rm (Einheit\hspace{-0.15cm}: \hspace{0.15cm}bit\hspace{0.15cm}oder\hspace{0.15cm}bit/Symbol)}
 \hspace{0.05cm}.$$
-Man nennt die Funktion&nbsp; $H_\text{bin}(p)$&nbsp; die&nbsp; '''binäre Entropiefunktion'''.&nbsp; Die Entropie einer Quelle mit größerem Symbolumfang&nbsp; $M$&nbsp; lässt sich häufig unter Verwendung von&nbsp; $H_\text{bin}(p)$&nbsp; ausdrücken.
+The function is called&nbsp; $H_\text{bin}(p)$&nbsp; the&nbsp; '''binary entropy function'''.&nbsp; The entropy of a source with a larger symbol range&nbsp; $M$&nbsp; can often be expressed using&nbsp; $H_\text{bin}(p)$&nbsp;.
 {{GraueBox|TEXT=
-$\text{Beispiel 2:}$&nbsp;
+$\text{Example 2:}$&nbsp;
-Die Grafik zeigt die binäre Entropiefunktion für die Werte&nbsp; $0 ≤ p ≤ 1$&nbsp; der Symbolwahrscheinlichkeit von&nbsp; $\rm A$&nbsp; $($oder auch von&nbsp; $\rm B)$.&nbsp; Man erkennt:
+The figure shows the binary entropy function for the values&nbsp; $0 ≤ p ≤ 1$&nbsp; of the symbol probability of&nbsp; $\rm A$&nbsp; $($or also of&nbsp; $\rm B)$.&nbsp; You can see
-[[File:Inf_T_1_1_S4_vers2.png|frame|Binäre Entropiefunktion als Funktion von&nbsp; $p$|right]]
+[[File:Inf_T_1_1_S4_vers2.png|frame|Binary entropy function as function of&nbsp; $p$|right]]
-*Der Maximalwert&nbsp; $H_\text{max} = 1\; \rm  bit$&nbsp; ergibt sich für&nbsp; $p = 0.5$, also für gleichwahrscheinliche Binärsymbole.&nbsp; Dann liefern&nbsp; $\rm A$&nbsp; und&nbsp; $\rm B$&nbsp; jeweils den gleichen Beitrag zur Entropie.
+*The maximum value&nbsp; $H_\text{max} = 1\; \rm bit$&nbsp; results for&nbsp; $p = 0.5$, thus for equally probable binary symbols.&nbsp; Then &nbsp; $\rm A$&nbsp; and&nbsp; $\rm B$&nbsp; contribute the same amount to entropy.
-* $H_\text{bin}(p)$&nbsp; ist symmetrisch um&nbsp; $p = 0.5$.&nbsp; Eine Quelle mit&nbsp; $p_{\rm A} = 0.1$&nbsp; und&nbsp; $p_{\rm B} = 0.9$&nbsp; hat die gleiche Entropie&nbsp;  $H = 0.469 \; \rm   bit$&nbsp; wie eine Quelle mit&nbsp; $p_{\rm A} = 0.9$&nbsp; und&nbsp; $p_{\rm B} = 0.1$.
+* $H_\text{bin}(p)$&nbsp; is symmetrical about&nbsp; $p = 0.5$.&nbsp; A source with&nbsp; $p_{\rm A} = 0.1$&nbsp; and&nbsp; $p_{\rm B} = 0. 9$&nbsp; has the same entropy&nbsp; $H = 0.469 \; \rm bit$&nbsp; as a source with&nbsp; $p_{\rm A} = 0.9$&nbsp; and&nbsp; $p_{\rm B} = 0.1$.
-*Die Differenz&nbsp; $ΔH = H_\text{max} - H$ gibt&nbsp; die&nbsp; ''Redundanz''&nbsp; der Quelle an und&nbsp; $r = ΔH/H_\text{max}$&nbsp; die&nbsp; ''relative Redundanz''.&nbsp; Im Beispiel ergeben sich&nbsp; $ΔH = 0.531\; \rm  bit$&nbsp; bzw.&nbsp; $r = 53.1 \rm \%$.
+*The difference&nbsp; $ΔH = H_\text{max} - H$ gives&nbsp; the&nbsp; ''redundancy''&nbsp; of the source and&nbsp; $r = ΔH/H_\text{max}$&nbsp; the&nbsp; ''relative redundancy''. &nbsp; In the example,&nbsp; $ΔH = 0.531\; \rm bit$&nbsp; and&nbsp; $r = 53.1 \rm \%$.
-*Für&nbsp; $p = 0$&nbsp; ergibt sich&nbsp; $H = 0$, da hier die Symbolfolge &nbsp;$\rm B \ B \ B \text{...}$&nbsp; mit Sicherheit vorhergesagt werden kann.&nbsp; Eigentlich ist nun der Symbolumfang nur noch&nbsp; $M = 1$.&nbsp; Gleiches gilt für&nbsp; $p = 1$ &nbsp; &rArr; &nbsp; Symbolfolge &nbsp;$\rm A \ A \ A \text{...}$.
+*For&nbsp; $p = 0$&nbsp; this results in&nbsp; $H = 0$, since the symbol sequence &nbsp;$\rm B \ B \ B \text{...}$&nbsp; can be predicted with certainty. &nbsp; Actually, the symbol range is now only&nbsp; $M = 1$.&nbsp; The same applies to&nbsp; $p = 1$ &nbsp; &rArr; &nbsp; symbol sequence &nbsp;$\rm A \ A \ A \ text{...}$.
-*$H_\text{bin}(p)$&nbsp; ist stets eine&nbsp; ''konkave Funktion'', da die zweite Ableitung nach dem Parameter&nbsp; $p$&nbsp; für alle Werte von&nbsp; $p$&nbsp; negativ ist:
+*$H_\text{bin}(p)$&nbsp; is always a&nbsp; ''concave function'', since the second derivative after the parameter&nbsp; $p$&nbsp; is negative for all values of&nbsp; $p$&nbsp;:
-:$$\frac{ {\rm d}^2H_{\rm bin} (p)}{ {\rm d}\,p^2} =  \frac{- 1}{ {\rm ln}(2) \cdot p \cdot (1-p)}< 0
+:$$\frac{ {\rm d}^2H_{\rm bin} (p)}{ {\rm d}\,p^2} = \frac{- 1}{ {\rm ln}(2) \cdot p \cdot (1-p)}< 0
 \hspace{0.05cm}.$$}}
-==Nachrichtenquellen mit größerem Symbolumfang==
+==Message sources with a larger symbol range==
 <br>
-Im&nbsp; [[Information_Theory/Gedächtnislose_Nachrichtenquellen#Modell_und_Voraussetzungen|ersten Abschnitt]]&nbsp; dieses Kapitels haben wir eine quaternäre Nachrichtenquelle&nbsp; $(M = 4)$&nbsp; mit den Symbolwahrscheinlichkeiten&nbsp; $p_{\rm A} = 0.4$, &nbsp; $p_{\rm B} = 0.3$, &nbsp; $p_{\rm C} = 0.2$ &nbsp; und&nbsp;  $ p_{\rm D} = 0.1$&nbsp; betrachtet.&nbsp; Diese Quelle besitzt die folgende Entropie:
+In the&nbsp; [[Information_Theory/Sources with Memory#Model_and_Prerequisites|first section]]&nbsp; of this chapter we have a quaternary message source&nbsp; $(M = 4)$&nbsp; with the symbol probabilities&nbsp; $p_{\rm A} = 0. 4$, &nbsp; $p_{\rm B} = 0.3$, &nbsp; $p_{\rm C} = 0.2$ &nbsp; and&nbsp; $ p_{\rm D} = 0.1$&nbsp; considered.&nbsp; This source has the following entropy:
-:$$H_{\rm quat} = 0.4 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.4} + 0.3 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.3} + 0.2 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.2}+ 0.1 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.1}.$$
+:$$H_{\rm quat} = 0.4 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.4} + 0.3 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0. 3} + 0.2 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.2}+ 0.1 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.1}.$$
-Oft ist zur zahlenmäßigen Berechnung der Umweg über den Zehnerlogarithmus&nbsp; $\lg \ x = {\rm log}_{10} \ x$&nbsp; sinnvoll, da der ''Logarithmus dualis''&nbsp; $ {\rm log}_2 \ x$&nbsp; meist auf Taschenrechnern nicht zu finden ist.
+For numerical calculation, the detour via the decimal logarithm&nbsp; $\lg \ x = {\rm log}_{10} \ x$&nbsp;, is often necessary. Since the ''logarithm dualis''&nbsp; $ {\rm log}_2 \ x$&nbsp; is mostly not found on pocket calculators.
-:$$H_{\rm quat}=\frac{1}{{\rm lg}\hspace{0.1cm}2} \cdot \left [ 0.4 \cdot {\rm lg}\hspace{0.1cm}\frac{1}{0.4} + 0.3 \cdot {\rm lg}\hspace{0.1cm}\frac{1}{0.3} + 0.2 \cdot {\rm lg}\hspace{0.1cm}\frac{1}{0.2}+ 0.1 \cdot {\rm lg}\hspace{0.1cm}\frac{1}{0.1} \right ] = 1.845\,{\rm bit}
+:$$H_{\rm quat}=\frac{1}{{\rm lg}\hspace{0.1cm}2} \cdot \left [ 0.4 \cdot {\rm lg}\hspace{0.1cm}\frac{1}{0.4} + 0.3 \cdot {\rm lg}\hspace{0.1cm}\frac{1}{0. 3} + 0.2 \cdot {\rm lg}\hspace{0.1cm}\frac{1}{0.2} + 0.1 \cdot {\rm lg}\hspace{0.1cm}\frac{1}{0.1} \right ] = 1.845\,{\rm bit}
 \hspace{0.05cm}.$$
-{{GraueBox|TEXT=
+{{{GraueBox|TEXT=
-$\text{Beispiel 3:}$&nbsp;
+$\text{Example 3:}$&nbsp;
-Nun bestehen zwischen den einzelnen Symbolwahrscheinlichkeiten gewisse Symmetrien:
+Now there are certain symmetries between the symbol probabilities:
-[[File:Inf_T_1_1_S5_vers2.png|frame|Entropie von Binärquelle und Quaternärquelle]]
+[[File:Inf_T_1_1_S5_vers2.png|frame|Entropy of binary source and quaternary source]]
-:$$p_{\rm A} = p_{\rm D} = p \hspace{0.05cm},\hspace{0.4cm}p_{\rm B} = p_{\rm C} = 0.5 - p \hspace{0.05cm},\hspace{0.3cm}{\rm mit} \hspace{0.15cm}0 \le p \le 0.5 \hspace{0.05cm}.$$
+:$$p_{\rm A} = p_{\rm D} = p \hspace{0.05cm},\hspace{0.4cm}p_{\rm B} = p_{\rm C} = 0.5 - p \hspace{0.05cm},\hspace{0.3cm}{\rm with} \hspace{0.15cm}0 \le p \le 0.5 \hspace{0.05cm}.$$
-In diesem Fall kann zur Entropieberechnung auf die binäre Entropiefunktion zurückgegriffen werden:
+In this case, the binary entropy function can be used to calculate the entropy:
-:$$H_{\rm quat} =  2 \cdot p \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{\hspace{0.1cm}p\hspace{0.1cm} } + 2 \cdot (0.5-p) \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.5-p}$$
+:$$H_{\rm quat} = 2 \cdot p \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{\hspace{0.1cm}p\hspace{0.1cm} } + 2 \cdot (0.5-p) \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.5-p}$$
-:$$\Rightarrow \hspace{0.3cm} H_{\rm quat} =   1 + H_{\rm bin}(2p) \hspace{0.05cm}.$$
+$$\Rightarrow \hspace{0.3cm} H_{\rm quat} = 1 + H_{\rm bin}(2p) \hspace{0.05cm}.$$
-Die Grafik zeigt  abhängig von&nbsp; $p$
+The graphic shows as a function of&nbsp; $p$
-*den Entropieverlauf der Quaternärquelle (blau)
+*the entropy of the quaternary source (blue)
-*im Vergleich zum Entropieverlauf der Binärquelle (rot).
+*in comparison to the entropy course of the binary source (red).
-Für die Quaternärquelle ist nur der Abszissen&nbsp;  $0 ≤ p ≤ 0.5$&nbsp; zulässig.
+For the quaternary source only the abscissa&nbsp; $0 ≤ p ≤ 0.5$&nbsp; is allowed.
 <br clear=all>
-Man erkennt aus der blauen Kurve für die Quaternärquelle:
+You can see from the blue curve for the quaternary source:
-*Die maximale Entropie&nbsp; $H_\text{max} = 2 \; \rm bit/Symbol$&nbsp; ergibt sich für&nbsp; $p = 0.25$ &nbsp; &rArr; &nbsp; gleichwahrscheinliche Symbole: &nbsp; $p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm A} = 0.25$.
+*The maximum entropy&nbsp; $H_\text{max} = 2 \; \rm bit/symbol$&nbsp; results for&nbsp; $p = 0.25$ &nbsp; &rArr; &nbsp; equally probable symbols: &nbsp; $p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm A} = 0.25$.
-*Mit&nbsp; $p = 0$&nbsp; bzw.&nbsp; $p = 0.5$&nbsp; entartet die Quaternärquelle zu einer Binärquelle mit&nbsp; $p_{\rm B} = p_{\rm C} = 0.5$&nbsp; und&nbsp; $p_{\rm A} = p_{\rm D} = 0$ &nbsp; &rArr; &nbsp; Entropie&nbsp; $H = 1 \; \rm bit/Symbol$.
+*With&nbsp; $p = 0$&nbsp; resp.&nbsp; $p = 0.5$&nbsp; the quaternary source degenerates to a binary source with&nbsp; $p_{\rm B} = p_{\rm C} = 0. 5$&nbsp; and&nbsp; $p_{\rm A} = p_{\rm D} = 0$ &nbsp; &rArr; &nbsp; entropy&nbsp; $H = 1 \; \rm bit/symbol$.
-*Die Quelle mit&nbsp; $p_{\rm A} = p_{\rm D} = 0.1$&nbsp; und&nbsp; $p_{\rm B} = p_{\rm C} = 0.4$&nbsp; weist folgende Kennwerte auf (jeweils mit der Pseudoeinheit „bit/Symbol”):
+*The source with&nbsp; $p_{\rm A} = p_{\rm D} = 0.1$&nbsp; and&nbsp; $p_{\rm B} = p_{\rm C} = 0.4$&nbsp; has the following characteristics (each with the pseudo unit "bit/symbol"):
-: &nbsp;   &nbsp; '''(1)''' &nbsp; Entropie: &nbsp; $H = 1 + H_{\rm bin} (2p) =1 + H_{\rm bin} (0.2) = 1.722,$
+: &nbsp; &nbsp; '''(1)''' &nbsp; entropy: &nbsp; $H = 1 + H_{\rm bin} (2p) =1 + H_{\rm bin} (0.2) = 1.722,$
-: &nbsp;   &nbsp; '''(2)''' &nbsp; Redundanz: &nbsp; ${\rm \Delta }H = {\rm log_2}\hspace{0.1cm} M - H =2- 1.722= 0.278,$
+: &nbsp; &nbsp; '''(2)''' &nbsp; Redundancy: &nbsp; ${\rm \Delta }H = {\rm log_2}\hspace{0.1cm} M - H =2- 1.722= 0.278,$
-: &nbsp;   &nbsp; '''(3)''' &nbsp; relative Redundanz: &nbsp; $r ={\rm \Delta }H/({\rm log_2}\hspace{0.1cm} M) = 0.139\hspace{0.05cm}.$
+: &nbsp; &nbsp; '''(3)''' &nbsp; relative redundancy: &nbsp; $r ={\rm \delta }H/({\rm log_2}\hspace{0.1cm} M) = 0.139\hspace{0.05cm}.$
-*Die Redundanz  der Quaternärquelle mit&nbsp; $p = 0.1$&nbsp; ist gleich&nbsp; $ΔH = 0.278 \; \rm bit/Symbol$&nbsp; und damit genau so groß wie die Redundanz der Binärquelle mit&nbsp; $p = 0.2$.}}
+*The redundancy of the quaternary source with&nbsp; $p = 0.1$&nbsp; is equal to&nbsp; $ΔH = 0.278 \; \rm bit/symbol$&nbsp; and thus exactly the same as the redundancy of the binary source with&nbsp; $p = 0.2$.}}
-==Aufgaben zum Kapitel==
+== Exercises for chapter==
 <br>
 [[Aufgaben:1.1 Wetterentropie|Aufgabe 1.1: Wetterentropie]]