Difference between revisions of "Information Theory/Application to Digital Signal Transmission"

From LNTwww
 
(93 intermediate revisions by 9 users not shown)
Line 1: Line 1:
 
   
 
   
 
{{Header
 
{{Header
|Untermenü=Information zwischen zwei wertdiskreten Zufallsgrößen
+
|Untermenü=Mutual Information Between Two Discrete Random Variables
 
|Vorherige Seite=Verschiedene Entropien zweidimensionaler Zufallsgrößen
 
|Vorherige Seite=Verschiedene Entropien zweidimensionaler Zufallsgrößen
 
|Nächste Seite=Differentielle Entropie
 
|Nächste Seite=Differentielle Entropie
 
}}
 
}}
  
==Informationstheoretisches Modell der Digitalsignalübertragung ==
+
==Information-theoretical model of digital signal transmission ==
 +
<br>
 +
The entropies defined so far in general terms are now applied to digital signal transmission, whereby we assume a discrete memoryless channel&nbsp; $\rm (DMC)$&nbsp; according to the graphic:
  
Die bisher allgemein definierten Entropien werden nun auf die Digitalsignalübertragung angewendet, wobei wir von einem '''digitalen Kanalmodell ohne Gedächtnis''' (englisch: ''Discrete Memoryless Channel'', DMC) entsprechend der nachfolgenden Grafik ausgehen:
+
[[File:P_ID2779__Inf_T_3_3_S1a_neu.png|right|frame|Digital signal transmission model under consideration]]
  
[[File:P_ID2779__Inf_T_3_3_S1a_neu.png|Betrachtetes Modell der Digitalsignalübertragung]]
+
*The set of source symbols is characterized by the discrete random variable&nbsp; $X$&nbsp;, where&nbsp; $|X|$&nbsp; indicates the source symbol set size:
 +
 +
:$$X = \{\hspace{0.05cm}x_1\hspace{0.05cm}, \hspace{0.15cm} x_2\hspace{0.05cm},\hspace{0.15cm} \text{...}\hspace{0.1cm} ,\hspace{0.15cm} x_{\mu}\hspace{0.05cm}, \hspace{0.15cm}\text{...}\hspace{0.1cm} , \hspace{0.15cm} x_{|X|}\hspace{0.05cm}\}\hspace{0.05cm}.$$
  
*Die Menge der möglichen Quellensymbole wird durch die diskrete Zufallsgröße $X$ charakterisiert, wobei $|X|$ den Quellensymbolumfang angibt:
+
*Correspondingly,&nbsp; $Y$&nbsp; characterizes the set of sink symbols with the symbol set size&nbsp; $|Y|$:
 
   
 
   
$$X = \{\hspace{0.05cm}x_1\hspace{0.05cm}, \hspace{0.05cm} x_2\hspace{0.05cm},\hspace{0.05cm} ...\hspace{0.1cm} ,\hspace{0.05cm} x_{\mu}\hspace{0.05cm}, \hspace{0.05cm}...\hspace{0.1cm} , \hspace{0.05cm} x_{|X|}\hspace{0.05cm}\}\hspace{0.05cm}.$$
+
:$$Y = \{\hspace{0.05cm}y_1\hspace{0.05cm}, \hspace{0.15cm} y_2\hspace{0.05cm},\hspace{0.15cm} \text{...}\hspace{0.1cm} ,\hspace{0.15cm} y_{\kappa}\hspace{0.05cm}, \hspace{0.15cm}\text{...}\hspace{0.1cm} , \hspace{0.15cm} Y_{|Y|}\hspace{0.05cm}\}\hspace{0.05cm}.$$
 +
 
 +
*Usually&nbsp; $|Y| = |X|$is valid.&nbsp; Also possible is&nbsp; $|Y| > |X|$, for example with the&nbsp; [[Channel_Coding/Kanalmodelle_und_Entscheiderstrukturen#Binary_Symmetric_Channel_.E2.80.93_BSC|$\text{Binary Erasure Channel}$]]&nbsp; (BEC):
 +
:$$X = \{0,\ 1\},\hspace{0.5cm} Y = \{0,\ 1,\ \ \text{E}\}\    ⇒  \ |X| = 2, \ |Y| = 3.$$
  
*Entsprechend kennzeichnet $Y$ die Menge der Sinkensymbole mit dem Symbolvorrat $|Y|$:
+
*The sink symbol&nbsp; $\rm E$&nbsp; indicates an&nbsp; "erasure".&nbsp; The event&nbsp; $Y=\text{E}$&nbsp; indicates that a decision for&nbsp; $0$&nbsp; or for&nbsp; $1$&nbsp; would be too uncertain.
 
$$Y = \{\hspace{0.05cm}y_1\hspace{0.05cm}, \hspace{0.05cm} y_2\hspace{0.05cm},\hspace{0.05cm} ...\hspace{0.1cm} ,\hspace{0.05cm} y_{\kappa}\hspace{0.05cm}, \hspace{0.05cm}...\hspace{0.1cm} , \hspace{0.05cm} Y_{|Y|}\hspace{0.05cm}\}\hspace{0.05cm}.$$
 
  
*Meist gilt $|Y|$ = $|X|$. Möglich ist aber auch $|Y|$ > $|X|$, zum Beispiel beim Binary Erasure Channel (BEC) mit $X$ = {0, 1} und $Y$ = {0, 1, E}  ⇒  $|X|$ = 2, $|Y|$ = 3.
+
*The symbol probabilities of the source and sink are accounted for in the graph by the probability mass functions&nbsp; $P_X(X)$&nbsp; and&nbsp; $P_Y(Y)$:
*Das Sinkensymbol $E$ kennzeichnet eine Auslöschung (englisch: ''Erasure''). Das Ereignis $Y$ = $E$ gibt an, dass eine Entscheidung für 0 oder für 1 zu unsicher wäre.
+
:$$P_X(x_{\mu}) = {\rm Pr}( X = x_{\mu})\hspace{0.05cm}, \hspace{0.3cm}
*Die Symbolwahrscheinlichkeiten der Quelle und der Sinke sind in der oberen Grafik durch die Wahrscheinlichkeitsfunktionen $P_X(X)$ und $P_Y(Y)$ berücksichtigt, wobei gilt:
 
 
$$P_X(x_{\mu}) = {\rm Pr}( X = x_{\mu})\hspace{0.05cm}, \hspace{0.3cm}
 
 
P_Y(y_{\kappa}) = {\rm Pr}( Y = y_{\kappa})\hspace{0.05cm}.$$
 
P_Y(y_{\kappa}) = {\rm Pr}( Y = y_{\kappa})\hspace{0.05cm}.$$
  
*Es gelte: $P_X(X)$ und $P_Y(Y)$ enthalten keine Nullen ⇒ $\text{supp}(P_X)$ = $P_X$, $\text{supp}(P_Y)$ = $P_Y$. Diese Voraussetzung erleichtert die Modellbeschreibung, ohne Verlust an Allgemeingültigkeit.
+
*Let it hold:&nbsp; The prtobability mass functions&nbsp; $P_X(X)$&nbsp; and&nbsp; $P_Y(Y)$&nbsp; contain no zeros &nbsp; &nbsp; $\text{supp}(P_X) = P_X$ &nbsp;and&nbsp; $\text{supp}(P_Y) = P_Y$.&nbsp; This prerequisite facilitates the description without loss of generality.
*Alle Übergangswahrscheinlichkeiten des digitalen gedächtnislosen Kanals (DMC) werden durch die ''bedingte Wahrscheinlichkeitsfunktion'' $P_{Y|X}(Y|X)$ erfasst. Mit $x_μ ∈ X$ und $y_κ ∈ Y$ gelte hierfür folgende Definition:
+
 
 +
*All transition probabilities of the discrete memoryless channel &nbsp; $\rm (DMC)$&nbsp; are captured by the&nbsp; conditional probability function&nbsp; $P_{Y|X}(Y|X)$.&nbsp;. With&nbsp; $x_μ ∈ X$&nbsp; and&nbsp; $y_κ ∈ Y$,&nbsp; the following definition applies to this:
 
   
 
   
$$P_{Y\hspace{0.01cm}|\hspace{0.01cm}X}(y_{\kappa}\hspace{0.01cm} |\hspace{0.01cm} x_{\mu}) = {\rm Pr}(Y\hspace{-0.1cm} = y_{\kappa}\hspace{0.03cm} | \hspace{0.03cm}X \hspace{-0.1cm}= x_{\mu})\hspace{0.05cm}.$$
+
:$$P_{Y\hspace{0.01cm}|\hspace{0.01cm}X}(y_{\kappa}\hspace{0.01cm} |\hspace{0.01cm} x_{\mu}) = {\rm Pr}(Y\hspace{-0.1cm} = y_{\kappa}\hspace{0.03cm} | \hspace{0.03cm}X \hspace{-0.1cm}= x_{\mu})\hspace{0.05cm}.$$
 
 
In obiger Grafik ist $P_{Y|X}(⋅)$ als ein Block mit $|X|$ Eingängen und $|Y|$ Ausgängen dargestellt. Die blauen Verbindungen markieren die Übergangswahrscheinlichkeiten $\text{Pr}(y_i | x_μ)$ ausgehend von $x_μ$ mit 1 ≤ $i$ ≤ $|Y|$, während alle roten Verbindungen bei $y_κ$ enden: $\text{Pr}(y_κ | x_i)$ mit 1 ≤ $i$ ≤ $|X|$.
 
 
 
  
Bevor wir die Entropien für die einzelnen Wahrscheinlichkeitsfunktionen angeben, nämlich
 
$P_X(X) ⇒ H(X)$, $P_Y(Y) ⇒ H(Y)$, $P_{XY}(X) ⇒ H(XY)$, $P_Y|X(Y|X) ⇒ H(Y|X)$, $P_{X|Y}(X|Y) ⇒ H(X|Y)$,
 
sollen die Aussagen der letzten Seite an einem Beispiel verdeutlicht werden.
 
  
 +
The green block in the graph marks&nbsp; $P_{Y|X}(⋅)$&nbsp; with&nbsp; $|X|$&nbsp; inputs and&nbsp; $|Y|$&nbsp; outputs.&nbsp; Blue connections mark transition probabilities&nbsp; $\text{Pr}(y_i | x_μ)$&nbsp; starting from&nbsp; $x_μ$&nbsp; with&nbsp; $1 ≤ i ≤ |Y|$,&nbsp; while all red connections end at&nbsp; $y_κ$:&nbsp; &nbsp; $\text{Pr}(y_κ | x_i)$&nbsp; with&nbsp; $1 ≤ i ≤ |X|$.
  
{{Beispiel}}
+
Before we give the entropies for the individual probability functions, viz.
 +
:$$P_X(X) ⇒ H(X),\hspace{0.5cm}  P_Y(Y) ⇒ H(Y), \hspace{0.5cm} P_{XY}(X) ⇒ H(XY), \hspace{0.5cm} P_{Y|X}(Y|X) ⇒ H(Y|X),\hspace{0.5cm} P_{X|Y}(X|Y) ⇒ H(X|Y),$$
 +
the above statements are to be illustrated by an example.
  
[[File:P_ID2780__Inf_T_3_3_S1b_neu.png|Digitales Kanalmodell <i>Binary Erasure Channel</i>]]
+
[[File:P_ID2780__Inf_T_3_3_S1b_neu.png|right|frame|Channel model&nbsp; "Binary Erasure Channel"&nbsp; $\rm (BEC)$]]
Im Buch [[Kanalcodierung]] behandeln wir den Binary Erasure Channel (BEC), der rechts in etwas modifizierter Form skizziert ist.
+
{{GraueBox|TEXT=
 +
$\text{Example 1}$:&nbsp; In the book&nbsp; "Channel Coding"&nbsp; we also deal with the&nbsp; [[Channel_Coding/Kanalmodelle_und_Entscheiderstrukturen#Binary_Erasure_Channel_.E2.80.93_BEC|$\text{Binary Erasure Channel}$]]&nbsp; $\rm (BEC)$, which is sketched in a somewhat modified form on the right. &nbsp; The following prerequisites apply:
 +
*Let the input alphabet be binary&nbsp; &rArr; &nbsp; $X = \{0,\ 1 \}$  &nbsp; ⇒  &nbsp; $\vert X\vert = 2$&nbsp; while three values are possible at the output &nbsp; &rArr; &nbsp; $Y = \{0,\ 1,\ \text{E} \}$  &nbsp; ⇒  &nbsp; $\vert Y\vert = 3$.
 +
*The symbol&nbsp; $\text{E}$&nbsp;  indicates the case that the receiver cannot decide for one of the binary symbols&nbsp; $0$&nbsp; or&nbsp; $1$&nbsp; due to too much channel interference.&nbsp; "E"&nbsp; stands for erasure.
 +
*With the&nbsp; $\rm BEC$&nbsp; according to the above sketch, both a transmitted&nbsp; $0$&nbsp; and a&nbsp; $1$&nbsp; are erased with probability&nbsp; $λ$&nbsp; while the probability of a correct transmission is&nbsp; $1 – λ$&nbsp; in each case.
 +
*In contrast, transmission errors are excluded by the BEC model  &nbsp; <br>⇒  &nbsp; the conditional probabilities &nbsp;$\text{Pr}(Y = 1 \vert X = 0)$ &nbsp;and&nbsp; $\text{Pr}(Y = 0 \vert X = 1)$ &nbsp;are both zero.
  
Dabei gelten folgende Voraussetzungen:
 
*Das Eingangsalphabet ist binär: $X$ = (0, 1)  ⇒  $|X|$ = 2, während am Ausgang drei Werte möglich sind: $Y$ = (0, 1, $E$)  ⇒  $|Y|$ = 3.
 
*„E” kennzeichnet den Fall, dass sich der Empfänger aufgrund von zu großen Kanalstörungen nicht für eines der Binärsymbole 0 oder 1 entscheiden kann. „E” steht hierbei für ''Erasure'' (Auslöschung).
 
*Beim BEC gemäß obiger Skizze werden sowohl eine gesendete „0” als auch eine „1” mit der Wahrscheinlichkeit $λ$ ausgelöscht, während die Wahrscheinlichkeit einer richtigen Übertragung jeweils 1 – $λ$ beträgt.
 
*Dagegen werden Übertragungsfehler durch das BEC–Modell ausgeschlossen  ⇒  die bedingten Wahrscheinlichkeiten Pr( $Y$ = 1 | $X$ = 0 ) sowie Pr( $Y$ = 0 | $X$ = 1) sind jeweils 0.
 
  
Beim Sender seien die Nullen und Einsen nicht unbedingt gleichwahrscheinlich. Vielmehr verwenden wir die beiden Wahrscheinlichkeitsfunktionen
+
At the transmitter, the&nbsp; "zeros"&nbsp; and&nbsp; "ones"&nbsp; would not necessarily be equally probable.&nbsp; Rather, we use the probability mass functions
+
:$$\begin{align*}P_X(X) & =  \big ( {\rm Pr}( X = 0)\hspace{0.05cm},\hspace{0.2cm} {\rm Pr}( X = 1) \big )\hspace{0.05cm},\\
$$\begin{align*}P_X(X) \hspace{-0.15cm} & =  \hspace{-0.15cm} \big ( {\rm Pr}( X = 0)\hspace{0.05cm}, {\rm Pr}( X = 1) \big )\hspace{0.05cm},\\
+
P_Y(Y) & = \big ( {\rm Pr}( Y = 0)\hspace{0.05cm},\hspace{0.2cm} {\rm Pr}( Y = 1)\hspace{0.05cm},\hspace{0.2cm} {\rm Pr}( Y = {\rm E}) \big )\hspace{0.05cm}.\end{align*}$$
P_Y(Y) \hspace{-0.15cm} & = \hspace{-0.15cm} \big ( {\rm Pr}( Y = 0)\hspace{0.05cm}, {\rm Pr}( Y = 1)\hspace{0.05cm}, {\rm Pr}( Y = {\rm E}) \big )\hspace{0.05cm}.\end{align*}$$
 
  
Aus obigem Modell erhalten wir dann:
+
From the above model we then get:
 
   
 
   
$$\begin{align*}P_Y(0) \hspace{-0.15cm} & = \hspace{-0.15cm}  {\rm Pr}( Y \hspace{-0.1cm} = 0) = P_X(0) \cdot ( 1 - \lambda)\hspace{0.05cm}, \\
+
:$$\begin{align*}P_Y(0) & =   {\rm Pr}( Y \hspace{-0.1cm} = 0) = P_X(0) \cdot ( 1 - \lambda)\hspace{0.05cm}, \\
P_Y(1) \hspace{-0.15cm} & = \hspace{-0.15cm} {\rm Pr}( Y \hspace{-0.1cm} = 1) = P_X(1) \cdot ( 1 - \lambda)\hspace{0.05cm}, \\
+
P_Y(1) & =  {\rm Pr}( Y \hspace{-0.1cm} = 1) = P_X(1) \cdot ( 1 - \lambda)\hspace{0.05cm}, \\
P_Y({\rm E}) \hspace{-0.15cm} & = \hspace{-0.15cm}  {\rm Pr}( Y \hspace{-0.1cm} = {\rm E}) = P_X(0) \cdot \lambda \hspace{0.1cm}+\hspace{0.1cm} P_X(1) \cdot \lambda \hspace{0.05cm}.\end{align*}$$
+
P_Y({\rm E}) & =   {\rm Pr}( Y \hspace{-0.1cm} = {\rm E}) = P_X(0) \cdot \lambda \hspace{0.1cm}+\hspace{0.1cm} P_X(1) \cdot \lambda \hspace{0.05cm}.\end{align*}$$
  
Fassen wir nun $P_X(X)$ und $P_Y(Y)$ als Vektoren auf, so lässt sich das Ergebnis wie folgt darstellen:
+
If we now take&nbsp; $P_X(X)$&nbsp; and&nbsp; $P_Y(Y)$&nbsp; to be vectors, the result can be represented as follows:
 
   
 
   
$$P_{\hspace{0.05cm}Y}(Y) = P_X(X) \cdot P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{-0.01cm}X}(Y\hspace{-0.01cm} |\hspace{-0.01cm} X) \hspace{0.05cm},$$
+
:$$P_{\hspace{0.05cm}Y}(Y) = P_X(X) \cdot P_{\hspace{0.05cm}Y\hspace{-0.01cm}\vert \hspace{-0.01cm}X}(Y\hspace{-0.01cm} \vert \hspace{-0.01cm} X) \hspace{0.05cm},$$
  
wobei die Übergangswahrscheinlichkeiten $\text{Pr}(y_κ | x_μ)$ durch folgende Matrix berücksichtigt sind:
+
where the transition probabilities &nbsp;$\text{Pr}(y_κ\vert x_μ)$&nbsp; are accounted for by the following matrix:
 
   
 
   
$$P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{-0.01cm}X}(Y\hspace{-0.01cm} |\hspace{-0.01cm} X) =  
+
:$$P_{\hspace{0.05cm}Y\hspace{-0.01cm} \vert \hspace{-0.01cm}X}(Y\hspace{-0.01cm} \vert \hspace{-0.01cm} X) =  
 
\begin{pmatrix}  
 
\begin{pmatrix}  
1 - \lambda   0 \lambda\\
+
1 - \lambda & 0 & \lambda\\
0 1 - \lambda   \lambda
+
0 & 1 - \lambda & \lambda
 
\end{pmatrix}\hspace{0.05cm}.$$
 
\end{pmatrix}\hspace{0.05cm}.$$
  
Beachten Sie: Wir haben diese Darstellung nur gewählt, um die Beschreibung zu vereinfachen. $P_X(X)$ und $P_Y(Y)$ sind keine Vektoren im eigentlichen Sinne und $P_{Y|X}(Y|X)$ ist keine Matrix.
+
Note:
 +
*We have chosen this representation only to simplify the description.  
 +
*$P_X(X)$&nbsp; and&nbsp; $P_Y(Y)$&nbsp; are not vectors in the true sense and&nbsp; $P_{Y \vert X}(Y\vert X)$&nbsp; is not a matrix either.}}
  
{{end}}
 
  
 +
==Directional diagram for digital signal transmission ==
 +
<br>
 +
All entropies defined in the&nbsp; [[Information_Theory/Verschiedene_Entropien_zweidimensionaler_Zufallsgrößen|"last chapter"]]&nbsp; also apply to digital signal transmission.&nbsp; However, it is expedient to choose the right-hand diagram instead of the diagram used so far, corresponding to the left-hand diagram, in which the direction from source&nbsp; $X$&nbsp; to sink&nbsp; $Y$&nbsp; is recognizable.
  
Alle in Kapitel 3.2 definierten Entropien gelten auch für die Digitalsignalübertragung. Es ist aber zweckmäßig, anstelle des bisher verwendeten Schaubildes (linke Grafik) die rechte Darstellung zu wählen, bei der die Richtung von der Quelle $X$ zur Sinke $Y$ erkennbar ist.
+
[[File:EN_Inf_T_3_3_S2_vers2.png|center|frame|Two information-theoretical models for digital signal transmission]]
  
[[File:P_ID2781__Inf_T_3_3_S2.png|Zwei informationstheoretische Modelle für die Digitalsignalübertragung]]
+
Let us now interpret the right graph starting from the general&nbsp; [[Information_Theory/Anwendung_auf_die_Digitalsignalübertragung#Information-theoretical_model_of_digital_signal_transmission|$\text{DMC model}$]]:
 
+
*The&nbsp; &raquo;'''source entropy'''&laquo;&nbsp; $H(X)$&nbsp; denotes the average information content of the source symbol sequence. &nbsp; With the symbol set size&nbsp; $|X|$&nbsp; applies:
Interpretieren wir nun ausgehend vom allgemeinen DMC–Kanalmodell die rechte Grafik:
 
*Die '''Quellenentropie''' (englisch: ''Source Entropy'') $H(X)$ bezeichnet den mittleren Informationsgehalt der Quellensymbolfolge. Mit dem Symbolumfang $|X|$ gilt:
 
 
   
 
   
$$H(X) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{1}{P_X(X)}\right ] \hspace{0.1cm}
+
:$$H(X) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{1}{P_X(X)}\right ] \hspace{0.1cm}
= -{\rm E} \left [ {\rm log}_2 \hspace{0.1cm}{P_X(X)}\right ] \hspace{0.2cm}
+
= -{\rm E} \big [ {\rm log}_2 \hspace{0.1cm}{P_X(X)}\big ] \hspace{0.2cm}
 
=\hspace{0.2cm} \sum_{\mu = 1}^{|X|}  
 
=\hspace{0.2cm} \sum_{\mu = 1}^{|X|}  
 
  P_X(x_{\mu}) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{P_X(x_{\mu})} \hspace{0.05cm}.$$
 
  P_X(x_{\mu}) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{P_X(x_{\mu})} \hspace{0.05cm}.$$
  
*Die '''Äquivokation''' (auch Rückschlussentropie genannt, englisch: ''Equivocation'') $H(X|Y)$ gibt den mittleren Informationsgehalt an, den ein Betrachter, der über die Sinke $Y$ genau Bescheid weiß, durch Beobachtung der Quelle $X$ gewinnt:
+
*The&nbsp; &raquo;'''equivocation'''&laquo;&nbsp; $H(X|Y)$&nbsp;  indicates the average information content that an observer who knows exactly about the sink&nbsp; $Y$&nbsp; gains by observing the source&nbsp; $X$&nbsp;:
 
   
 
   
$$H(X|Y) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}X\hspace{-0.01cm}|\hspace{-0.01cm}Y}(X\hspace{-0.01cm} |\hspace{0.03cm} Y)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{|X|} \sum_{\kappa = 1}^{|Y|}  
+
:$$H(X|Y) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}X\hspace{-0.01cm}|\hspace{-0.01cm}Y}(X\hspace{-0.01cm} |\hspace{0.03cm} Y)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{|X|} \sum_{\kappa = 1}^{|Y|}  
 
  P_{XY}(x_{\mu},\hspace{0.05cm}y_{\kappa}) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}X\hspace{-0.01cm}|\hspace{0.03cm}Y}
 
  P_{XY}(x_{\mu},\hspace{0.05cm}y_{\kappa}) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}X\hspace{-0.01cm}|\hspace{0.03cm}Y}
 
(\hspace{0.05cm}x_{\mu}\hspace{0.03cm} |\hspace{0.05cm} y_{\kappa})}
 
(\hspace{0.05cm}x_{\mu}\hspace{0.03cm} |\hspace{0.05cm} y_{\kappa})}
 
  \hspace{0.05cm}.$$
 
  \hspace{0.05cm}.$$
  
*Die Äquivokation ist der Anteil der Quellenentropie $H(X)$, der durch Kanalstörungen (bei digitalem Kanal: Übertragungsfehler) verloren geht. Es verbleibt die '''Transinformation''' (englisch: ''Mutual Information'') $I(X; Y)$, die zur Sinke gelangt:
+
*The equivocation is the portion of the source entropy&nbsp; $H(X)$&nbsp; that is lost due to channel interference&nbsp; (for digital channel: transmission errors).&nbsp; The&nbsp; &raquo;'''mutual information'''&laquo;&nbsp; $I(X; Y)$&nbsp; remains, which reaches the sink:
 
   
 
   
$$I(X;Y) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_{XY}(X, Y)}{P_X(X) \cdot P_Y(Y)}\right ] \hspace{0.2cm} = H(X) - H(X|Y) \hspace{0.05cm}.$$
+
:$$I(X;Y) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_{XY}(X, Y)}{P_X(X) \cdot P_Y(Y)}\right ] \hspace{0.2cm} = H(X) - H(X|Y) \hspace{0.05cm}.$$
  
*Die '''Irrelevanz''' (manchmal auch ''Streuentropie'' genannt, englisch: ''Irrelevance'') $H(Y|X)$ gibt den mittleren Informationsgehalt an, den ein Betrachter, der über die Quelle $X$ genau Bescheid weiß, durch Beobachtung der Sinke $Y$ gewinnt:
+
*The&nbsp; &raquo;'''irrelevance'''&laquo;&nbsp; $H(Y|X)$&nbsp; indicates the average information content that an observer who knows exactly about the source&nbsp; $X$&nbsp; gains by observing the sink&nbsp; $Y$:
 
   
 
   
$$H(Y|X) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{-0.01cm}X}(Y\hspace{-0.01cm} |\hspace{0.03cm} X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{|X|} \sum_{\kappa = 1}^{|Y|}  
+
:$$H(Y|X) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{-0.01cm}X}(Y\hspace{-0.01cm} |\hspace{0.03cm} X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{|X|} \sum_{\kappa = 1}^{|Y|}  
 
  P_{XY}(x_{\mu},\hspace{0.05cm}y_{\kappa}) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{0.03cm}X}
 
  P_{XY}(x_{\mu},\hspace{0.05cm}y_{\kappa}) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{0.03cm}X}
 
(\hspace{0.05cm}y_{\kappa}\hspace{0.03cm} |\hspace{0.05cm} x_{\mu})}
 
(\hspace{0.05cm}y_{\kappa}\hspace{0.03cm} |\hspace{0.05cm} x_{\mu})}
 
  \hspace{0.05cm}.$$
 
  \hspace{0.05cm}.$$
  
*Die '''Sinkenentropie''' $H(Y)$, der mittlere Informationsgehalt der Sinke, ist die Summe aus der nützlichen Transinformation $I(X; Y)$ und der Irrelevanz $H(Y|X)$, die ausschließlich von Kanalfehlern herrührt:
+
*The&nbsp; &raquo;'''sink entropy'''&laquo;&nbsp; $H(Y)$, the mean information content of the sink.&nbsp; $H(Y)$&nbsp; is the sum of the useful mutual information&nbsp; $I(X; Y)$&nbsp; and the useless irrelevance&nbsp; $H(Y|X)$, which comes exclusively from channel errors:
 
  
 
  
$$H(Y) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{1}{P_Y(Y)}\right ] \hspace{0.1cm}
+
:$$H(Y) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{1}{P_Y(Y)}\right ] \hspace{0.1cm}
= -{\rm E} \left [ {\rm log}_2 \hspace{0.1cm}{P_Y(Y)}\right ] \hspace{0.2cm} =I(X;Y) + H(Y|X)  
+
= -{\rm E} \big [ {\rm log}_2 \hspace{0.1cm}{P_Y(Y)}\big ] \hspace{0.2cm} =I(X;Y) + H(Y|X)  
 
  \hspace{0.05cm}.$$
 
  \hspace{0.05cm}.$$
 
   
 
   
==Transinformationsberechnung für den Binärkanal==  
+
==Calculation of the mutual information for the binary channel==  
 
+
<br>
Die Definitionen der letzten Seite sollen nun an einem Beispiel verdeutlicht werden, wobei wir bewusst vermeiden, die Berechnungen durch die Ausnutzung von Symmetrien zu vereinfachen.
+
These definitions will now be illustrated by an example. &nbsp; We deliberately avoid simplifying the calculations by exploiting symmetries.
 
 
{{Beispiel}}
 
  
[[File:P_ID2782__Inf_T_3_3_S3a.png|Allgemeines Modell des Binärkanals]]
+
[[File:P_ID2782__Inf_T_3_3_S3a.png|right|frame|General model of the binary channel]]
Wir betrachten den allgemeinen Binärkanal (englisch: ''Binary Channel'') ohne Gedächtnis gemäß der Skizze mit den Verfälschungswahrscheinlichkeiten.
+
{{GraueBox|TEXT=
 +
$\text{Example 2}$:&nbsp; We consider the general binary channel without memory according to the sketch.&nbsp; Let the falsification probabilities be:
 
    
 
    
$$\begin{align*}\varepsilon_0 \hspace{-0.15cm} & =  \hspace{-0.15cm}{\rm Pr}(Y\hspace{-0.1cm} = 1\hspace{0.05cm} | X \hspace{-0.1cm}= 0) = 0.01\hspace{0.05cm},\\
+
:$$\begin{align*}\varepsilon_0 & =  {\rm Pr}(Y\hspace{-0.1cm} = 1\hspace{0.05cm}\vert X \hspace{-0.1cm}= 0) = 0.01\hspace{0.05cm},\\
\varepsilon_1 \hspace{-0.15cm} & = \hspace{-0.15cm} {\rm Pr}(Y\hspace{-0.1cm} = 0\hspace{0.05cm} | X \hspace{-0.1cm}= 1) = 0.2\hspace{0.05cm}\end{align*}$$
+
\varepsilon_1 & = {\rm Pr}(Y\hspace{-0.1cm} = 0\hspace{0.05cm} \vert X \hspace{-0.1cm}= 1) = 0.2\hspace{0.05cm}\end{align*}$$
  
$$\Rightarrow \hspace{0.3cm}  P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{-0.01cm}X}(Y\hspace{-0.01cm} |\hspace{-0.01cm} X) =  
+
:$$\Rightarrow \hspace{0.3cm}  P_{\hspace{0.05cm}Y\hspace{-0.01cm} \vert \hspace{-0.01cm}X}(Y\hspace{-0.01cm} \vert \hspace{-0.01cm} X) =  
 
\begin{pmatrix}  
 
\begin{pmatrix}  
 
1 - \varepsilon_0  & \varepsilon_0\\
 
1 - \varepsilon_0  & \varepsilon_0\\
Line 137: Line 137:
 
\end{pmatrix} \hspace{0.05cm}.$$
 
\end{pmatrix} \hspace{0.05cm}.$$
  
Außerdem gehen wir von nicht gleichwahrscheinlichen Quellensymbolen aus:
+
Furthermore, we assume source symbols that are not equally probable:
 
   
 
   
$$P_X(X) = \big ( p_0, p_1 \big )=
+
:$$P_X(X) = \big ( p_0,\ p_1 \big )=
\big ( 0.1, 0.9 \big )
+
\big ( 0.1,\ 0.9 \big )
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
Mit der binären Entropiefunktion erhält man so für die Quellenentropie:
+
With the&nbsp; [[Information_Theory/Gedächtnislose_Nachrichtenquellen#Binary_entropy_function|$\text{binary entropy function}$]]&nbsp; $H_{\rm bin}(p)$,&nbsp; we thus obtain for the source entropy:
 
   
 
   
$$H(X) = H_{\rm bin} (0.1) = 0.4690 \,{\rm bit}
+
:$$H(X) = H_{\rm bin} (0.1) = 0.4690 \hspace{0.12cm}{\rm bit}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
Für die Wahrscheinlichkeitsfunktion der Sinke sowie für die Sinkenentropie ergibt sich somit:
+
For the probability mass function of the sink as well as for the sink entropy we thus obtain:
 
    
 
    
$$P_Y(Y) = \big ( {\rm Pr}( Y\hspace{-0.1cm} = 0)\hspace{0.05cm}, {\rm Pr}( Y \hspace{-0.1cm}= 1) \big ) = \big ( p_0\hspace{0.05cm}, p_1 \big ) \cdot  
+
:$$P_Y(Y) = \big [ {\rm Pr}( Y\hspace{-0.1cm} = 0)\hspace{0.05cm}, \ {\rm Pr}( Y \hspace{-0.1cm}= 1) \big ] = \big ( p_0\hspace{0.05cm},\ p_1 \big ) \cdot  
 
\begin{pmatrix}  
 
\begin{pmatrix}  
 
1 - \varepsilon_0  & \varepsilon_0\\
 
1 - \varepsilon_0  & \varepsilon_0\\
Line 156: Line 156:
 
\end{pmatrix} $$
 
\end{pmatrix} $$
  
$$\begin{align*}\Rightarrow \hspace{0.3cm}  {\rm Pr}( Y \hspace{-0.1cm}= 0)\hspace{-0.15cm} & =  \hspace{-0.15cm}
+
:$$\begin{align*}\Rightarrow \hspace{0.3cm}  {\rm Pr}( Y \hspace{-0.1cm}= 0)& =   
 
p_0 \cdot (1 - \varepsilon_0) + p_1 \cdot \varepsilon_1 =
 
p_0 \cdot (1 - \varepsilon_0) + p_1 \cdot \varepsilon_1 =
 
0.1 \cdot 0.99 + 0.9 \cdot 0.2 = 0.279\hspace{0.05cm},\\
 
0.1 \cdot 0.99 + 0.9 \cdot 0.2 = 0.279\hspace{0.05cm},\\
{\rm Pr}( Y \hspace{-0.1cm}= 1)\hspace{-0.15cm} & =  \hspace{-0.15cm} 1 - {\rm Pr}( Y \hspace{-0.1cm}= 0) = 0.721\end{align*}$$
+
{\rm Pr}( Y \hspace{-0.1cm}= 1) & =  1 - {\rm Pr}( Y \hspace{-0.1cm}= 0) = 0.721\end{align*}$$
  
$$\Rightarrow \hspace{0.2cm}
+
:$$\Rightarrow \hspace{0.3cm}
H(Y) = H_{\rm bin} (0.279) = 0.8541 \,{\rm bit}
+
H(Y) = H_{\rm bin} (0.279) = 0.8541 \hspace{0.12cm}{\rm bit}
 
\hspace{0.05cm}. $$
 
\hspace{0.05cm}. $$
  
Auf der nächsten Seite werden noch berechnet:
+
The joint probabilities&nbsp; $p_{\mu \kappa} = \text{Pr}\big[(X = μ) ∩ (Y = κ)\big]$&nbsp; between source and sink are:
*die Verbundentropie $H(XY)$,
 
*die Transinformation $I(X; Y)$,
 
*die Rückschlussentropie $H(X|Y)$  ⇒  Äquivokation,
 
*die Streuentropie $H(Y|X)$  ⇒  Irrelevanz.
 
Diese Ergebnisse sind in der folgenden zusammenfassenden Grafik bereits mit aufgenommen.
 
 
 
[[File:P_ID2783__Inf_T_3_3_S3b_neu.png|Informationstheoretisches Modell des betrachteten Binärkanals]]
 
 
 
[[File:P_ID2782__Inf_T_3_3_S3a.png|Allgemeines Modell des Binärkanals]]
 
 
 
Wir betrachten weiter den allgemeinen Binärkanal (englisch: ''Binary Channel'') ohne Gedächtnis gemäß der Skizze, und es gelte weiterhin:
 
 
 
$$P_X(X) = \big ( p_0, p_1 \big )=
 
\big ( 0.1, 0.9 \big )
 
\hspace{0.05cm},$$
 
 
 
$$\varepsilon_0 = {\rm Pr}(Y\hspace{-0.1cm} = 1\hspace{0.05cm} | X \hspace{-0.1cm}= 0) = 0.01\hspace{0.05cm}, \hspace{0.3cm}
 
\varepsilon_1 ={\rm Pr}(Y\hspace{-0.1cm} = 0\hspace{0.05cm} | X \hspace{-0.1cm}= 1) = 0.2\hspace{0.05cm}.$$
 
 
 
Die Verbundwahrscheinlichkeiten $p_{\{μκ\}}$ = $\text{Pr}[(X = μ) ∩ (Y = κ)]$ zwischen Quelle und Sinke sind:
 
 
   
 
   
$$\begin{align*}p_{00}\hspace{-0.15cm} & = \hspace{-0.15cm} p_0 \cdot (1 - \varepsilon_0) = 0.099\hspace{0.05cm},\hspace{0.5cm}p_{01}= p_0 \cdot \varepsilon_0 = 0.001\hspace{0.05cm},\\
+
:$$\begin{align*}p_{00} & =   p_0 \cdot (1 - \varepsilon_0) = 0.099\hspace{0.05cm},\hspace{0.5cm}p_{01}= p_0 \cdot \varepsilon_0 = 0.001\hspace{0.05cm},\\
p_{10}\hspace{-0.15cm} & =  \hspace{-0.15cm} p_1 \cdot (1 - \varepsilon_1) = 0.180\hspace{0.05cm},\hspace{0.5cm}p_{11}= p_1 \cdot \varepsilon_1 = 0.720\hspace{0.05cm}.\end{align*}$$
+
p_{10} & =  p_1 \cdot (1 - \varepsilon_1) = 0.180\hspace{0.05cm},\hspace{0.5cm}p_{11}= p_1 \cdot \varepsilon_1 = 0.720\hspace{0.05cm}.\end{align*}$$
  
Daraus erhält man für
+
From this one obtains for
*die '''Verbundentropie''' (englisch ''Joint Entropy''):
+
*the&nbsp; &raquo;'''joint entropy'''&laquo;:
 
   
 
   
$$H(XY) =  p_{00}\hspace{-0.05cm} \cdot \hspace{-0.05cm}{\rm log}_2 \hspace{0.05cm} \frac{1}{p_{00}} +
+
:$$H(XY) =  p_{00}\hspace{-0.05cm} \cdot \hspace{-0.05cm}{\rm log}_2 \hspace{0.05cm} \frac{1}{p_{00} \rm } +
p_{01} \hspace{-0.05cm} \cdot \hspace{-0.05cm}{\rm log}_2 \hspace{0.05cm} \frac{1}{p_{01}} +
+
p_{01} \hspace{-0.05cm} \cdot \hspace{-0.05cm}{\rm log}_2 \hspace{0.05cm} \frac{1}{p_{01} \rm } +
p_{10}\hspace{-0.05cm} \cdot \hspace{-0.05cm} {\rm log}_2 \hspace{0.05cm} \frac{1}{p_{10}} +
+
p_{10}\hspace{-0.05cm} \cdot \hspace{-0.05cm} {\rm log}_2 \hspace{0.05cm} \frac{1}{p_{10} \rm } +
p_{11} \hspace{-0.05cm} \cdot \hspace{-0.05cm} {\rm log}_2\hspace{0.05cm} \frac{1}{p_{11}} = 1.1268\,{\rm bit} \hspace{0.05cm},$$
+
p_{11} \hspace{-0.05cm} \cdot \hspace{-0.05cm} {\rm log}_2\hspace{0.05cm} \frac{1}{p_{11}\rm } = 1.1268\,{\rm bit} \hspace{0.05cm},$$
  
*die '''Transinformation''' (englisch ''Mutual Information''):
+
*the&nbsp; &raquo;'''mutual information'''&laquo;:
 
   
 
   
$$I(X;Y) = H(X) + H(Y) - H(XY)  = 0.4690 + 0.8541 - 1.1268 = 0.1963\,{\rm bit}
+
:$$I(X;Y) = H(X) + H(Y) - H(XY)  = 0.4690 + 0.8541 - 1.1268 = 0.1963\hspace{0.12cm}{\rm bit}
 
  \hspace{0.05cm},$$
 
  \hspace{0.05cm},$$
  
*die '''Äquivokation''' (oder Rückschlussentropie):
+
[[File:EN_Inf_T_3_3_S3b_vers2.png|right|frame|Information theoretic model of the binary channel under consideration]]
 +
*the&nbsp; &raquo;'''equivocation'''&laquo;:
 
   
 
   
$$H(X|Y) = H(X) - I(X;Y) =  0.4690 - 0.1963 = 0.2727\,{\rm bit}
+
:$$H(X \vert Y) \hspace{-0.01cm} =\hspace{-0.01cm}  H(X) \hspace{-0.01cm} -\hspace{-0.01cm}  I(X;Y) \hspace{-0.01cm}  $$
 +
:$$\Rightarrow \hspace{0.3cm}  H(X \vert Y) \hspace{-0.01cm}  = \hspace{-0.01cm} 0.4690\hspace{-0.01cm}  -\hspace{-0.01cm}  0.1963\hspace{-0.01cm} =\hspace{-0.01cm}  0.2727\hspace{0.12cm}{\rm bit}
 
  \hspace{0.05cm},$$
 
  \hspace{0.05cm},$$
  
*die '''Irrelevanz''' (oder Streuentropie):
+
*the &raquo;'''irrelevance'''&laquo;:
 
   
 
   
$$H(Y|X) = H(Y) - I(X;Y) = 0.8541 - 0.1963 = 0.6578\,{\rm bit}
+
:$$H(Y \vert X) = H(Y) - I(X;Y) $$
 +
:$$\Rightarrow \hspace{0.3cm}  H(Y \vert X) = 0.8541 - 0.1963 = 0.6578\hspace{0.12cm}{\rm bit}
 
  \hspace{0.05cm}.$$
 
  \hspace{0.05cm}.$$
  
Die Ergebnisse sind in der folgenden Grafik nochmals zusammengefasst.
+
The results are summarized in the graph.}}
 
 
[[File:P_ID2785__Inf_T_3_3_S3b_neu.png|Informationstheoretisches Modell des betrachteten Binärkanals]]
 
  
{{end}}
 
  
 
+
''Notes'':  
''Anmerkung'': Äquivokation und Irrelevanz hätte man auch direkt (aber mit Mehraufwand) aus den entsprechenden Wahrscheinlichkeitsfunktionen berechnen können. Zum Beispiel die Irrelevanz:
+
*The equivocation and irrelevance could also have been calculated directly (but with extra effort) from the corresponding probability functions.  
 +
*For example, the irrelevance:
 
    
 
    
$$H(Y|X) = \hspace{-0.2cm} \sum_{(x, y) \hspace{0.05cm}\in \hspace{0.05cm}XY} \hspace{-0.2cm} P_{XY}(x,\hspace{0.05cm}y) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{0.03cm}X}
+
:$$H(Y|X) = \hspace{-0.2cm} \sum_{(x, y) \hspace{0.05cm}\in \hspace{0.05cm}XY} \hspace{-0.2cm} P_{XY}(x,\hspace{0.05cm}y) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{0.03cm}X}
(\hspace{0.05cm}y\hspace{0.03cm} |\hspace{0.05cm} x)}=$$
+
(\hspace{0.05cm}y\hspace{0.03cm} |\hspace{0.05cm} x)}= p_{00} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{1\hspace{-0.08cm} - \hspace{-0.08cm}\varepsilon_0} +
 
$$.\hspace{0.3cm} = p_{00} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{1 - \varepsilon_0} +
 
 
p_{01} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{\varepsilon_0} +
 
p_{01} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{\varepsilon_0} +
p_{10} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{1 - \varepsilon_1} +
+
p_{10} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{1\hspace{-0.08cm} - \hspace{-0.08cm}\varepsilon_1} +
 
p_{11} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{\varepsilon_1} = 0.6578\,{\rm bit} \hspace{0.05cm}.$$
 
p_{11} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{\varepsilon_1} = 0.6578\,{\rm bit} \hspace{0.05cm}.$$
  
==Definition und Bedeutung der Kanalkapazität ==  
+
==Definition and meaning of channel capacity ==  
 +
<br>
 +
We further consider a discrete memoryless channel&nbsp; $\rm (DMC)$&nbsp; with a finite number of source symbols &nbsp; ⇒ &nbsp; $|X|$&nbsp; and also only finitely many sink symbols &nbsp;  ⇒  &nbsp; $|Y|$,&nbsp; as shown in the first section of this chapter.
 +
*If one calculates the mutual information&nbsp; $I(X, Y)$&nbsp; as explained in&nbsp; $\text{Example 2}$,&nbsp;  it also depends on the source statistic  &nbsp;  ⇒  &nbsp;  $P_X(X)$.
 +
* Ergo: &nbsp; '''The mutual information'''&nbsp; $I(X, Y)$&nbsp;''' is not a pure channel characteristic'''.
  
Wir betrachten weiter einen diskreten gedächtnislosen Kanal (englisch: ''Discrete Memoryless Channel'', kurz DMC) mit einer endlichen Anzahl an Quellensymbolen ⇒ $|X|$ und ebenfalls nur endlich vielen Sinkensymbolen  ⇒  $|Y|$, wie auf der ersten Seite dieses Kapitels dargestellt. Berechnet man die Transinformation $I(X, Y)$ wie zuletzt an einem Beispiel ausgeführt, so hängt diese auch von der Quellenstatistik  ⇒  $P_X(X)$ ab. Ergo: $I(X, Y)$ ist keine reine Kanalkenngröße.
 
  
{{Definition}}
+
{{BlaueBox|TEXT=
Die von Claude E. Shannon eingeführte '''Kanalkapazität''' (englisch: ''Channel Capacity'') lautet entsprechend seinem Standardwerk <ref>Shannon, C.E.: ''A Mathematical Theory of Communication''. In: Bell Syst. Techn. J. 27 (1948), S. 379-423 und S. 623-656.</ref>:
+
$\text{Definition:}$&nbsp; The&nbsp; &raquo;'''channel capacity'''&laquo;&nbsp; introduced by&nbsp; [https://en.wikipedia.org/wiki/Claude_Shannon $\text{Claude E. Shannon}$]&nbsp; according to his standard work&nbsp; [Sha48]<ref name = Sha48>Shannon, C.E.:&nbsp; A Mathematical Theory of Communication. In:&nbsp; Bell Syst. Techn. J. 27 (1948), S. 379-423 und S. 623-656.</ref> reads:
 
   
 
   
$$C = \max_{P_X(X)} \hspace{0.15cm}  I(X;Y)  \hspace{0.05cm}.$$
+
:$$C = \max_{P_X(X)} \hspace{0.15cm}  I(X;Y)  \hspace{0.05cm}.$$
  
Da nach dieser Definition stets die bestmögliche Quellenstatistik zugrunde liegt, hängt $C$ nur von den Kanaleigenschaften ⇒ $P_{Y|X}(Y|X)$ ab, nicht jedoch von $P_X(X)$. Oft wird die Zusatzeinheit „bit/Kanalzugriff” hinzugefügt, bei englischen Texten „bit/use”.
+
The additional unit&nbsp; "bit/use"&nbsp; is often added.&nbsp; Since according to this definition the best possible source statistics are always the basis:
 +
*$C$&nbsp; depends only on the channel properties &nbsp; &nbsp; $P_{Y \vert X}(Y \vert X)$,
 +
*but not on the source statistics &nbsp; ⇒ &nbsp; $P_X(X)$.&nbsp; }}
  
{{end}}
 
  
 +
Shannon needed the quantity&nbsp; $C$&nbsp; to formulate the&nbsp; "Channel Coding Theorem" – one of the highlights of the information theory he founded.
  
C. E. Shannon benötigte diese Kanalbeschreibungsgröße $C$, um das Kanalcodierungstheorem formulieren zu können – eines der Highlights der von ihm begründeten Informationstheorie.
+
{{BlaueBox|TEXT=
 +
$\text{Shannon's Channel Coding Theorem: }$&nbsp;
 +
*For every transmission channel with channel capacity&nbsp; $C > 0$,&nbsp; there exists (at least) one&nbsp; $(k,\ n)$–block code,&nbsp; whose (block) error probability approaches zero&nbsp; as long as the code rate&nbsp; $R = k/n$&nbsp; is less than or equal to the channel capacity: &nbsp;
 +
:$$R ≤ C.$$
 +
*The prerequisite for this, however,&nbsp;  is that the following applies to the block length of this code: &nbsp; $n → ∞.$}}
  
{{Definition}}
 
'''Shannons Kanalcodierungstheorem''': Zu jedem Übertragungskanal mit der Kanalkapazität $C$ > 0 existiert (mindestens) ein $(k, n)$–Blockcode, dessen (Block–)Fehlerwahrscheinlichkeit gegen Null geht, so lange die Coderate $R$ = $k/n$ kleiner oder gleich der Kanalkapazität ist: $R ≤ C$. Voraussetzung hierfür ist allerdings, dass für die Blocklänge dieses Codes gilt: $n → ∞$.
 
  
{{end}}
+
The proof of this theorem,&nbsp; which is beyond the scope of our learning tutorial,&nbsp; can be found for example in&nbsp; [CT06]<ref name="CT06">Cover, T.M.; Thomas, J.A.:&nbsp; Elements of Information Theory.&nbsp; West Sussex: John Wiley & Sons, 2nd Edition, 2006.</ref>,&nbsp;  [Kra13]<ref name="Kra13">Kramer, G.:&nbsp; Information Theory.&nbsp; Lecture manuscript, Chair of Communications Engineering, Technische Universität München, 2013.</ref>&nbsp; and&nbsp; [Meck09]<ref name="Meck09">Mecking, M.:&nbsp; Information Theory.&nbsp; Lecture manuscript, Chair of Communications Engineering, Technische Universität München, 2009.</ref>.
  
 
+
As will be shown in&nbsp; [[Aufgaben:Exercise_3.13:_Code_Rate_and_Reliability|"Exercise 3.13"]],&nbsp; the reverse is also true.&nbsp; This proof can also be found in the literature references just mentioned.
Den Beweis dieses Theorems finden Sie zum Beispiel in <ref name="CT06">Cover, T.M.; Thomas, J.A.: ''Elements of Information Theory''. West Sussex: John Wiley & Sons, 2nd Edition, 2006.</ref>, <ref name="Kra13">Kramer, G.: ''Information Theory''. Vorlesungsmanuskript, Lehrstuhl für Nachrichtentechnik, Technische Universität München, 2013.</ref> und <ref name="Meck09">Mecking, M.: ''Information Theory''. Vorlesungsmanuskript, Lehrstuhl für Nachrichtentechnik, Technische Universität München, 2009.</ref>. Der Beweis würde den Rahmen unseres Lerntutorials sprengen.  
+
 
+
{{BlaueBox|TEXT=
Im Kapitel 4.3 wird im Zusammenhang mit dem wertkontinuierlichen AWGN–Kanalmodell ausgeführt, welche phänomenal große Bedeutung Shannons informationstheoretisches Theorem für die gesamte Informationstechnik besitzt, nicht nur für ausschließlich theoretisch Interessierte, sondern ebenso auch für Praktiker.
+
$\text{Reverse of Shannon's channel coding theorem: }$&nbsp;
Wie in Aufgabe A3.12 gezeigt werden soll, gilt auch der Umkehrschluss:
+
 
+
If the rate&nbsp;  $R$&nbsp; of the&nbsp; $(n,\ k)$–block code used is greater than the channel capacity&nbsp; $C$,&nbsp; then an arbitrarily small block error probability is not achievable.}}
 
 
Ist die Rate des verwendeten ( $n$, $k$ )–Blockcodes größer als die Kanalkapazität  ⇒  $\mathbf{R = k/n > C}$, so kann '''niemals eine beliebig kleine Blockfehlerwahrscheinlichkeit''' erreicht werden.
 
  
  
Auch diesen Beweis finden Sie zum Beispiel wieder in <ref name="CT06" />, <ref name="Kra13" /> und <ref name="Meck09" />.
+
In the chapter&nbsp;  [[Information_Theory/AWGN_Channel_Capacity_for_Discrete_Input#AWGN_model_for_discrete-time_band-limited_signals|"AWGN model for discrete-time band-limited signals"]]&nbsp; it is explained in connection with the continuous&nbsp; [[Channel_Coding/Channel_Models_and_Decision_Structures#AWGN_channel_at_binary_input|$\text{AWGN channel model}$]]&nbsp; &nbsp; what phenomenally great significance Shannon's theorem has for the entire field of information technology,&nbsp; not only for those interested exclusively in theory,&nbsp; but also for practitioners.
  
 
   
 
   
==Kanalkapazität eines Binärkanals== 
+
==Channel capacity of a binary channel== 
 
+
<br>
[[File:P_ID2786__Inf_T_3_3_S3a.png|Allgemeines Modell des Binärkanals]]
+
[[File:P_ID2786__Inf_T_3_3_S3a.png|right|frame|General model of the binary channel]]
Die Transinformation des allgemeinen (unsymmetrischen) Binärkanals gemäß der nebenstehenden Grafik wurde auf der vorletzten Seite berechnet. Bei diesem Modell werden die Eingangssymbole „0” und „1” unterschiedlich stark verfälscht:
+
The mutual information of the general&nbsp; (asymmetrical)&nbsp; binary channel according to this sketch was calculated in&nbsp; [[Information_Theory/Anwendung_auf_die_Digitalsignalübertragung#Transinformationsberechnung_f.C3.BCr_den_Bin.C3.A4rkanal|$\text{Example 2}$]].&nbsp;  In this model, the input symbols&nbsp; $0$&nbsp; and&nbsp; $1$&nbsp; are distorted to different degrees:
 
   
 
   
$$P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{-0.01cm}X}(Y\hspace{-0.01cm} |\hspace{-0.01cm} X) =  
+
:$$P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{-0.01cm}X}(Y\hspace{-0.01cm} |\hspace{-0.01cm} X) =  
 
\begin{pmatrix}  
 
\begin{pmatrix}  
 
1 - \varepsilon_0  & \varepsilon_0\\
 
1 - \varepsilon_0  & \varepsilon_0\\
Line 275: Line 259:
 
\end{pmatrix}  \hspace{0.05cm}.$$
 
\end{pmatrix}  \hspace{0.05cm}.$$
  
Die Transinformation lässt sich mit $P_X(X)$ = $(p_0, p_1)$ in folgender Form kompakt darstellen:
+
The mutual information can be represented with the probability mass function&nbsp; $P_X(X) = (p_0,\ p_1)$&nbsp; as follows:
 
   
 
   
$$\begin{align*}I(X  ;Y) =  \sum_{\mu = 1}^{2} \hspace{0.1cm}\sum_{\kappa = 1}^{2} \hspace{0.2cm}
+
:$$I(X  ;Y) =  \sum_{\mu = 1}^{2} \hspace{0.1cm}\sum_{\kappa = 1}^{2} \hspace{0.2cm}
 
  {\rm Pr} (\hspace{0.05cm}y_{\kappa}\hspace{0.03cm} |\hspace{0.05cm} x_{\mu}) \cdot  
 
  {\rm Pr} (\hspace{0.05cm}y_{\kappa}\hspace{0.03cm} |\hspace{0.05cm} x_{\mu}) \cdot  
 
{\rm Pr} (\hspace{0.05cm}x_{\mu}\hspace{0.05cm})\cdot {\rm log}_2 \hspace{0.1cm} \frac{{\rm Pr}
 
{\rm Pr} (\hspace{0.05cm}x_{\mu}\hspace{0.05cm})\cdot {\rm log}_2 \hspace{0.1cm} \frac{{\rm Pr}
 
(\hspace{0.05cm}y_{\kappa}\hspace{0.03cm} |\hspace{0.05cm} x_{\mu})}{{\rm Pr}
 
(\hspace{0.05cm}y_{\kappa}\hspace{0.03cm} |\hspace{0.05cm} x_{\mu})}{{\rm Pr}
(\hspace{0.05cm}y_{\kappa})} = \\
+
(\hspace{0.05cm}y_{\kappa})} $$
& = \hspace{-0.15cm}  (1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_0) \cdot p_0 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_0}{(1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_0) \cdot p_0 + \varepsilon_1 \cdot p_1} +
+
:$$\begin{align*}\Rightarrow \hspace{0.3cm}  I(X  ;Y) &=   \hspace{-0.01cm}  (1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_0) \cdot p_0 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_0}{(1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_0) \cdot p_0 + \varepsilon_1 \cdot p_1} +
\varepsilon_0 \cdot p_0 \cdot {\rm log}_2 \hspace{0.1cm} \frac{\varepsilon_0}{(1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_0) \cdot p_0 + \varepsilon_1 \cdot p_1} + \\
+
\varepsilon_0 \cdot p_0 \cdot {\rm log}_2 \hspace{0.1cm} \frac{\varepsilon_0}{(1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_0) \cdot p_0 + \varepsilon_1 \cdot p_1} \ + \\
& +  \hspace{-0.15cm} \varepsilon_1 \cdot p_1 \cdot {\rm log}_2 \hspace{0.1cm} \frac{\varepsilon_1}{\varepsilon_0 \cdot p_0 + (1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_1) \cdot p_1} +  (1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_1) \cdot p_1 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_1}{\varepsilon_0 \cdot p_0 + (1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_1) \cdot p_1}
+
& +  \hspace{-0.01cm} \varepsilon_1 \cdot p_1 \cdot {\rm log}_2 \hspace{0.1cm} \frac{\varepsilon_1}{\varepsilon_0 \cdot p_0 + (1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_1) \cdot p_1} +  (1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_1) \cdot p_1 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_1}{\varepsilon_0 \cdot p_0 + (1  \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_1) \cdot p_1}
 
  \hspace{0.05cm}.\end{align*}$$
 
  \hspace{0.05cm}.\end{align*}$$
  
[[File:P_ID2788__Inf_T_3_3_S4a.png|Ergebnisse für den <i>Binary Channel</i>]]
+
[[File:EN_Inf_T_3_3_S4a.png|right|frame|Mutual information for the <br>"asymmetrical binary channel"]]
 +
{{GraueBox|TEXT=
 +
$\text{Example 3}$:&nbsp;
 +
In the following we set&nbsp; $ε_0 = 0.01$&nbsp; and&nbsp; $ε_1 = 0.2$.
  
Im Folgenden setzen wir $ε_0$ = 0.01 und $ε_1$ = 0.2. In der vierten Spalte der nebenstehenden Tabelle (grün hinterlegt) ist die Transinformation $I(X; Y)$ dieses unsymmetrischen Binärkanals abhängig von der Quellensymbolwahrscheinlichkeit $p_0$ = Pr( $X$ = 0 ) angegeben. Man erkennt:
+
Column 4 of the adjacent table shows&nbsp; (highlighted in green)&nbsp; the mutual information&nbsp; $I(X; Y)$&nbsp; of this asymmetrical binary channel depending on the source symbol probability&nbsp; $p_0 = {\rm Pr}(X = 0)$&nbsp; .&nbsp; One can see:
*Die Transinformation $I(X; Y)$ hängt von den Symbolwahrscheinlichkeiten $p_0$ und $p_1$ = 1 – $p_0$ ab.
+
*The mutual information depends on the symbol probabilities&nbsp; $p_0$&nbsp; and&nbsp; $p_1 = 1 - p_0$.
*Der Maximalwert der Transinformation ergibt sich für $p_0$ ≈ 0.55  ⇒  $p_1$ ≈ 0.45.
+
*Here the maximum value of&nbsp; $I(X; Y)$&nbsp;  results in &nbsp;$p_0 ≈ 0.55$&nbsp; &nbsp; ⇒  &nbsp; &nbsp;$p_1 ≈ 0.45$.
*Das Optimierungsergebnis $p_0 > p_1$ folgt aus der Relation $ε_0 < ε_1$ (die „0” wird weniger verfälscht).
+
*The result&nbsp; $p_0 > p_1$&nbsp; follows from the relation&nbsp; $ε_0 < ε_1$&nbsp; (the&nbsp; "zero"&nbsp; is less distorted).
*Die Kanalkapazität ist somit für $ε_0$ = 0.01, $ε_1$ = 0.2 gleich $C$ = 0.5779 bit/Kanalzugriff.
+
*The capacity of this channel is&nbsp; $C = 0.5779 \ \rm bit/use$.}}
In obiger Gleichung ist als Sonderfall auch der Binary Symmetric Channel (BSC) mit dem Parameter $ = $ε_0$ = $ε_1$ mitenthalten. In Aufgabe A3.9 wird die Transinformation des BSC–Kanals für $ = 0.1, $p_0$ = 0.2 berechnet und in Aufgabe Z3.9 seine Kanalkapazität wie folgt angegeben:
+
<br clear=all>
 +
In the above equation, the&nbsp; [[Channel_Coding/Channel_Models_and_Decision_Structures#Binary_Symmetric_Channel_.E2.80.93_BSC|$\text{Binary Symmetric Channel}$]]&nbsp; $\rm (BSC)$&nbsp; with parameters&nbsp; $ε = ε_0 = ε_1$&nbsp; is also included as a special case.&nbsp; Hints:
 +
*In&nbsp; [[Aufgaben:Exercise_3.10:_Mutual_Information_at_the_BSC|"Exercise 3.10"]]&nbsp; the mutual information of the BSC is calculated for the system parameters&nbsp; $ε = 0.1$ &nbsp;and&nbsp; $p_0 = 0.2$&nbsp; .
 +
*In&nbsp;  [[Aufgaben:Exercise_3.10Z:_BSC_Channel_Capacity|"Exercise 3.10Z"]]&nbsp; its channel capacity is given as follows:
 
    
 
    
$$C_{\rm BSC} = 1 - H_{\rm bin} (\varepsilon) \hspace{0.05cm}.$$
+
:$$C_{\rm BSC} = 1 - H_{\rm bin} (\varepsilon) \hspace{0.05cm}.$$
  
==Eigenschaften symmetrischer Kanäle ==  
+
==Properties of symmetrical channels ==  
 +
<br>
 +
The capacity calculation of the (general)&nbsp; [[Information_Theory/Anwendung_auf_die_Digitalsignalübertragung#Information-theoretical_model_of_digital_signal_transmission|$\text{discrete memoryless channel}$]]&nbsp; $\rm (DMC)$&nbsp; is often complex.&nbsp; It simplifies decisively if symmetries of the channel are exploited. &nbsp;
  
Die Kapazitätsberechnung des (allgemeinen) diskreten gedächtnislosen Kanals ist oftmals aufwändig. Sie vereinfacht sich entscheidend, wenn Symmetrien des Kanals ausgenutzt werden. Die Grafik zeigt zwei Beispiele.
+
[[File:EN_Inf_T_3_3_S6a_vers2.png|right|frame|Examples of symmetrical channels]]
  
[[File:P_ID2793__Inf_T_3_3_S6a.png|Beispiele symmetrischer Kanäle]]
+
<br>The diagram shows two examples:
  
*Beim ''gleichmäßig dispersiven'' Kanal (englisch: ''Uniformly Dispersive Channel'') ergibt sich für alle Quellensymbole $x ∈ X$ die genau gleiche Menge an Übergangswahrscheinlichkeiten ⇒  $\{P_Y|X(y_κ|x)\}$ mit 1 ≤ $κ$ $|Y|$. In der linken Grafik ist dies durch die Werte $q$, $r$ und $s$ mit $q + r + s$ = 1 angedeutet.
+
*In the case of the&nbsp; <u>uniformly dispersive channel</u>&nbsp; all source symbols&nbsp; $x ∈ X$&nbsp; result in exactly the same set of transition probabilities &nbsp; &nbsp; $\{P_{Y\hspace{0.03cm}|\hspace{0.01cm}X}(y_κ\hspace{0.05cm}|\hspace{0.05cm}x)\}$&nbsp; with &nbsp;$1 ≤ κ ≤ |Y|$.&nbsp; Here &nbsp;$q + r + s = 1$&nbsp; must always apply here&nbsp; (see left graph).
*Beim gleichmäßig fokussierenden Kanal (englisch: ''Uniformely Focusing Channel'') ergibt sich für alle Sinkensymbole $y ∈ Y$ die gleiche Menge an Übergangswahrscheinlichkeiten  ⇒ $\{P_Y|X(y|x_μ)\}$ mit 1 ≤ $μ$ ≤ $|X|$. Hier muss nicht notwendigerweise $t + u + v$ = 1 gelten (siehe rechte Grafik).
 
  
 +
*In the case of the&nbsp; <u>uniformly focusing channel</u>&nbsp;, the same set of transition probabilities&nbsp; ⇒  &nbsp; $\{P_{Y\hspace{0.03cm}|\hspace{0.01cm}X}(y\hspace{0.05cm}|\hspace{0.05cm}x_μ)\}$&nbsp; with &nbsp;$1 ≤ μ ≤ |X|$ results for all sink symbols&nbsp; $y ∈ Y$&nbsp;. &nbsp; Here, &nbsp;$t + u + v = 1$&nbsp; need <u>not</u> necessarily hold&nbsp; (see right graph).
 +
<br clear=all>
 +
{{BlaueBox|TEXT=
 +
$\text{Definition:}$&nbsp; If a discrete memoryless channel is both uniformly dispersive and uniformly focusing,&nbsp; it is a&nbsp; &raquo;'''strictly symmetric channel'''&laquo;.&nbsp;
 +
*With an equally distributed source alphabet, this channel has the capacity
 +
:$$C = {\rm log}_2 \hspace{0.1cm} \vert Y \vert  + \sum_{y \hspace{0.05cm}\in\hspace{0.05cm} Y} \hspace{0.1cm} P_{\hspace{0.03cm}Y \vert \hspace{0.01cm} X}(y\hspace{0.05cm} \vert \hspace{0.05cm}x) \cdot
 +
{\rm log}_2 \hspace{0.1cm}P_{\hspace{0.01cm}Y \vert  \hspace{0.01cm} X}(y\hspace{0.05cm}\vert\hspace{0.05cm} x)
 +
\hspace{0.05cm}.$$
 +
*Any&nbsp; $x ∈ X$&nbsp; can be used for this equation.}}
  
{{Definition}}
 
Ist ein diskreter gedächtnisloser Kanal sowohl gleichmäßig dispersiv als auch gleichmäßig fokussierend, so liegt ein '''streng symmetrischer Kanal''' (englisch: ''Strongly Symmetric Channel'') vor. Bei gleichverteiltem Quellenalphabet besitzt dieser die Kapazität
 
 
$$C = {\rm log}_2 \hspace{0.1cm} |Y| + \sum_{y \hspace{0.05cm}\in\hspace{0.05cm} Y} \hspace{0.1cm} P_{\hspace{0.01cm}Y \mid \hspace{0.01cm} X}(y|x) \cdot
 
{\rm log}_2 \hspace{0.1cm}P_{\hspace{0.01cm}Y \mid \hspace{0.01cm} X}(y|x)
 
\hspace{0.05cm}.$$
 
  
Für diese Gleichung kann jedes beliebige $x ∈ X$ herangezogen werden.
+
This definition will now be clarified by an example.
 +
 
 +
{{GraueBox|TEXT=
 +
$\text{Example 4}$:&nbsp; In the channel under consideration, there are connections between all&nbsp; $ \vert X \vert  = 3$&nbsp; inputs and all&nbsp; $ \vert Y \vert  = 3$&nbsp; outputs:
 +
[[File:P_ID2794__Inf_T_3_3_S6b.png|right|frame|Strongly symmetrical channel&nbsp; $\vert X \vert = \vert Y \vert= 3$]]
 +
*A red connection stands for&nbsp; $P_{Y \hspace{0.03cm}\vert\hspace{0.01cm} X}(y_κ \hspace{0.05cm} \vert \hspace{0.05cm} x_μ) = 0.7$.
 +
*A blue connection stands for&nbsp; $P_{Y \hspace{0.03cm}\vert\hspace{0.01cm} X}(y_κ \hspace{0.05cm}\vert \hspace{0.05cm} x_μ) = 0.2$.
 +
*A green connection stands for&nbsp; $P_{Y \hspace{0.03cm}\vert\hspace{0.01cm} X}(y_κ \hspace{0.05cm}\vert\hspace{0.05cm} x_μ) = 0.1$.
  
{{end}}
 
  
 +
According to the above equation, the following applies to the channel capacity:
 +
 +
:$$C = {\rm log}_2 \hspace{0.1cm} (3) + 0.7 \cdot {\rm log}_2 \hspace{0.1cm} (0.7)
 +
+ 0.2 \cdot {\rm log}_2 \hspace{0.1cm} (0.2) + 0.1 \cdot {\rm log}_2 \hspace{0.1cm} (0.1) = 0.4282 \,\,{\rm bit} \hspace{0.05cm}.$$
  
Diese Definition soll durch ein Beispiel verdeutlicht werden.
+
Notes:
 +
*The addition of&nbsp; "the same set of transition probabilities”&nbsp; in the above definitions does not mean that it must apply:
 +
:$$P_{Y \hspace{0.03cm}\vert\hspace{0.01cm} X}(y_κ  \hspace{0.05cm}\vert\hspace{0.05cm} x_1) = P_{Y \hspace{0.03cm}\vert\hspace{0.01cm} X}(y_κ  \hspace{0.05cm}\vert\hspace{0.05cm} x_2) = P_{Y \hspace{0.03cm}\vert\hspace{0.01cm} X}(y_κ \hspace{0.05cm}\vert\hspace{0.05cm} x_3).$$
 +
*Rather, here a red, a blue and a green arrow leaves from each input and a red, a blue and a green arrow arrives at each output.
 +
*The respective sequences permute: &nbsp; R – G – B, &nbsp; &nbsp;  B – R – G, &nbsp; &nbsp; G – B – R.}}
  
{{Beispiel}}
 
[[File:P_ID2794__Inf_T_3_3_S6b.png|Streng symmetrischer Kanal mit |<i>X</i>| = |<i>Y</i>| = 3]]
 
Beim betrachteten Kanal bestehen Verbindungen zwischen allen $|X|$ = 3 Eingängen und allen $|Y|$ = 3 Ausgängen:
 
*Eine rote Verbindung steht für $P_{Y|X}(y_κ|x_μ)$ = 0.7.
 
*Eine blaue Verbindung steht für $P_{Y|X}(y_κ|x_μ)$ = 0.2.
 
*Eine grüne Verbindung steht für $P_{Y|X}(y_κ|x_μ)$ = 0.1.
 
  
 +
An example of a strictly symmetrical channel is the&nbsp; [[Channel_Coding/Channel_Models_and_Decision_Structures#Binary_Symmetric_Channel_.E2.80.93_BSC|$\text{Binary Symmetric Channel}$]]&nbsp; $\rm (BSC)$.&nbsp; In contrast, the&nbsp; [[Channel_Coding/Channel_Models_and_Decision_Structures#Binary_Erasure_Channel_.E2.80.93_BEC|$\text{Binary Erasure Channel}$]]&nbsp; $\rm (BEC)$&nbsp;  is not strictly symmetric,&nbsp; because,
 +
*although it is uniformly dispersive,
 +
*but it is not uniformly focusing.
  
Nach obiger Gleichung gilt dann für die Kanalkapazität:
 
 
$$C = {\rm log}_2 \hspace{0.1cm} (3) + 0.7 \cdot {\rm log}_2 \hspace{0.1cm} (0.7)
 
+ 0.2 \cdot {\rm log}_2 \hspace{0.1cm} (0.2) + 0.1 \cdot {\rm log}_2 \hspace{0.1cm} (0.1) = 0.4282 \,\,{\rm bit} \hspace{0.05cm}.$$
 
  
''Hinweis'': Der Zusatz „die gleiche Menge an Übergangswahrscheinlichkeiten” bedeutet nicht, dass $P_Y|X(y_κ|x_1)$ = $P_Y|X(y_κ|x_2)$ = $P_Y|X(y_κ|x_3)$ gelten muss. Vielmehr geht in diesem Beispiel von jedem Eingang ein roter, ein blauer und ein grüner Pfeil ab und an jeden Ausgang kommt ein roter, ein blauer und ein grüner Pfeil an. Die jeweiligen Reihenfolgen permutieren. R – G – B,  B – R – G,  G – B – R.
+
The following definition is less restrictive than the previous one of a strictly symmetric channel.
  
{{end}}
+
{{BlaueBox|TEXT=
 +
$\text{Definition:}$&nbsp; A&nbsp; &raquo;'''symmetric channel'''&laquo;&nbsp; exists,
 +
*if it can be divided into several&nbsp; $($generally $L)$&nbsp; strongly symmetrical sub-channels,
 +
*by splitting the output alphabet&nbsp; $Y$&nbsp; into&nbsp; $L$&nbsp; subsets&nbsp; $Y_1$, ... , $Y_L$&nbsp;.
  
  
Ein Beispiel für einen streng symmetrischen Kanal ist der Binary Symmetric Channel (BSC). Dagegen ist der Binary Erasure Channel (BEC) nicht streng symmetrisch, da er
+
Such a&nbsp; "symmetric channel"&nbsp; has the following capacity:
*zwar gleichmäßig dispersiv ist,
+
*aber nicht gleichmäßig fokussierend.
+
:$$C = \sum_{l \hspace{0.05cm}=\hspace{0.05cm} 1}^{L} \hspace{0.1cm} p_{\hspace{0.03cm}l} \cdot C_{\hspace{0.03cm}l} \hspace{0.05cm}.$$
  
Nachfolgende Definition ist weniger restriktiv als die vorherige des streng symmetrischen Kanals.
+
The following designations are used here:
 +
* $p_{\hspace{0.03cm}l}$&nbsp;indicates the probability that the&nbsp; $l$–th  sub-channel is selected.
 +
* $C_{\hspace{0.03cm}l}$&nbsp; is the channel capacity of this&nbsp; $l$–th sub-channel.}}
  
  
{{Definition}}
+
[[File:EN_Inf_T_3_3_S6c_v2.png|right|frame|Symmetrical channel consisting of two strongly symmetrical <br>sub-channels&nbsp;  $\rm A$&nbsp; and&nbsp; $\rm B$]]
Ein '''symmetrischer Kanal''' (englisch: ''Symmetric Channel'') liegt vor, wenn er in mehrere (allgemein $L$) streng symmetrische Teilkanäle aufgeteilt werden kann, indem das Ausgangsalphabet $Y$ in $L$ Teilmengen $Y_1$, ..., $Y_L$ aufgespalten wird. Ein solcher symmetrischer Kanal besitzt folgende Kapazität:
+
The diagram illustrates this definition for&nbsp; $L = 2$&nbsp; with the sub-channels&nbsp; $\rm A$&nbsp; and&nbsp; $\rm B$.
 +
*The differently drawn transitions&nbsp; (dashed or dotted)&nbsp; show that the two sub-channels can be different,&nbsp; so that&nbsp; $C_{\rm A} ≠ C_{\rm B}$&nbsp; will generally apply.
 +
*For the capacity of the total channel one thus obtains in general:
 
   
 
   
$$C = \sum_{l \hspace{0.05cm}=\hspace{0.05cm} 1}^{L} \hspace{0.1cm} p_l \cdot C_l \hspace{0.05cm}.$$
+
:$$C = p_{\rm A} \cdot C_{\rm A} +  p_{\rm B} \cdot C_{\rm B} \hspace{0.05cm}.$$
  
Hierbei sind folgende Bezeichnungen verwendet:
+
*No statement is made here about the structure of the two sub-channels.
* $p_l$ gibt die Wahrscheinlichkeit an, dass der $l$–te Teilkanal ausgewählt wird,
 
* $C_l$ ist die Kanalkapazität dieses $l$–ten Teilkanals.
 
  
{{end}}
 
  
Die Grafik verdeutlicht diese Definition für $L$ = 2, wobei die Teilkanäle mit A und B bezeichnet sind. An den unterschiedlich gezeichneten Übergängen (gestrichelt oder gepunktet) erkennt man, dass die zwei Teilkanäle durchaus verschieden sind, so dass $C_A$ ≠ $C_B$ gelten wird.
+
The following example will show that the&nbsp; "Binary Erasure Channel"&nbsp; $\rm (BEC)$&nbsp; can also be described in principle by this diagram.  &nbsp; However, the two output symbols&nbsp; $y_3$&nbsp; and&nbsp; $y_4$&nbsp; must then be combined into a single symbol.
 +
<br clear=all>
 +
[[File:EN_Inf_T_3_3_S6d.png|right|frame|$\rm BEC$&nbsp; in two different representations]]
 +
{{GraueBox|TEXT=
 +
$\text{Example 5}$:&nbsp; The left figure shows the usual representation of the&nbsp; [[Channel_Coding/Channel_Models_and_Decision_Structures#Binary_Erasure_Channel_.E2.80.93_BEC|$\text{Binary Erasure Channel}$]]&nbsp; $\rm (BEC)$&nbsp; with input&nbsp; $X = \{0,\ 1\}$&nbsp; and output&nbsp; $Y = \{0,\ 1,\ \text{E} \}$.  
  
[[File:P_ID2795__Inf_T_3_3_S6c_neu.png|Symmetrischer Kanal, bestehend aus zwei streng symmetrischen Teilkanälen A und B]]
+
If one divides this according to the right grafic into
 +
*an&nbsp; "ideal channel"&nbsp; $(y = x)$&nbsp; for
 +
:$$y ∈ Y_{\rm A} = \{0, 1\} \ \ ⇒  \ \ C_{\rm A} = 1 \ \rm bit/use,$$
 +
*an&nbsp; "erasure channel"&nbsp; $(y = {\rm E})$&nbsp; for
 +
:$$y ∈ Y_{\rm B} = \{\rm E \} \ \ ⇒  \ \ C_{\rm B} = 0,$$
  
Für die Kapazität des Gesamtkanals erhält man somit allgemein:
+
then we get with the sub-channel weights&nbsp; $p_{\rm A} = 1 – λ$&nbsp; and&nbsp; $p_{\rm B} = λ$:
 
   
 
   
$$C = p_{\rm A} \cdot C_{\rm A} +  p_{\rm B} \cdot C_{\rm B}  \hspace{0.05cm}.$$
+
:$$C_{\rm BEC} = p_{\rm A} \cdot C_{\rm A} = 1 - \lambda \hspace{0.05cm}.$$
 +
 
 +
Both channels are strongly symmetrical. &nbsp; The following applies equally for the (ideal) channel&nbsp; $\rm A$&nbsp;
 +
*for&nbsp; $X = 0$&nbsp; and&nbsp; $X = 1$: &nbsp; &nbsp;  $\text{Pr}(Y = 0 \hspace{0.05cm}\vert \hspace{0.05cm} X) = \text{Pr}(Y = 1 \hspace{0.05cm} \vert\hspace{0.05cm} X) = 1 - λ$  &nbsp;  ⇒  &nbsp;  uniformly dispersive,
 +
*for&nbsp; $Y = 0$ &nbsp;and&nbsp;&nbsp; $Y = 1$: &nbsp;  &nbsp;  $\text{Pr}(Y \hspace{0.05cm} \vert \hspace{0.05cm} X= 0) = Pr(Y \hspace{0.05cm}\vert\hspace{0.05cm} X = 1) = 1 - λ$ &nbsp; ⇒  &nbsp;  uniformly focusing.
  
Über die Struktur der beiden Teilkanäle wird hier keine Aussage gemacht. Im Beispiel auf der nächsten Seite wird sich zeigen, dass auch der BEC durch diese Grafik grundsätzlich beschreibbar ist. Allerdings müssen dann die zwei Ausgangssysmbole $y_3$ und $y_4$ zu einem einzigen Symbol zusammengefasst werden.
 
  
{{Beispiel}}
+
The same applies to the erasure channel&nbsp; $\rm B$.}}
Die linke Grafik zeigt den Binary Erasure Channel (BEC) mit Eingang $X$ = {0, 1} und Ausgang $Y$ = {0, 1, $E$}, wie er meistens gezeichnet wird. Teilt man diesen entsprechend der rechten Grafik auf in
 
einen idealen Kanal $(y = x)$ mit $y ∈ Y_A$ = {0, 1} ⇒  $C_A$ = 1 bit,
 
einen Auslöschungskanal mit $y ∈ Y_B$ = $\{E \}$  ⇒  $C_B$ = 0,
 
so ergibt sich mit den Teilkanalgewichtungen $p_A$ = 1 – $λ$ und $p_B$ = $λ$ für die Kanalkapazität:
 
 
$$C_{\rm BEC} = p_{\rm A} \cdot C_{\rm A} = 1 - \lambda \hspace{0.05cm}.$$
 
  
[[File:P_ID2796__Inf_T_3_3_S6d.png|BEC in zwei verschiedenen Darstellungen]]
 
  
Beide Kanäle sind streng symmetrisch. Für den (idealen) Kanal A gilt gleichermaßen
+
In&nbsp; [[Aufgaben:Exercise_3.12:_Strictly_Symmetrical_Channels|"Exercise 3.12"]]&nbsp; it will be shown that the capacity of the&nbsp; [[Channel_Coding/Channel_Models_and_Decision_Structures#Binary_Symmetric_Error_.26_Erasure_Channel_.E2.80.93_BSEC|$\text{Binary Symmetric Error & Erasure Channel}$]]&nbsp; $\rm (BSEC)$&nbsp; model can be calculated in the same way.&nbsp; One obtains:
*für $X = 0$ und $X = 1$:   $\text{Pr}(Y = 0|X) = \text{Pr}(Y = 1|X) = 1 – λ$  ⇒  gleichmäßig dispersiv,
 
*für $Y = 0$ und $Y = 1$:    $\text{Pr}(Y|X = 0) = Pr(Y|X = 1) = 1 – λ$ ⇒  gleichmäßig fokussierend.
 
  
Entsprechendes gilt für den Auslöschungskanal B.
+
:$$C_{\rm BSEC}  = (1- \lambda) \cdot \left [ 1 - H_{\rm bin}(\frac{\varepsilon}{1- \lambda}) \right ]$$
  
{{end}}
+
*with the crossover probability&nbsp; $ε$
 +
*and the erasure probability&nbsp; $λ$.
  
  
In Aufgabe A3.11 wird sich zeigen, dass die Kapazität des Kanalmodells Binary Symmetric Error & Erasure Channel (BSEC) in gleicher Weise berechnet werden kann. Mit
 
*der Verfälschungswahrscheinlichkeit $ε$ und
 
*der Auslöschungswahrscheinlichkeit $λ$
 
  
erhält man in diesem Fall:
+
==Some basics of channel coding ==
 +
<br>
 +
In order to interpret the channel coding theorem correctly, some basics of&nbsp; &raquo;'''channel coding'''&laquo;.&nbsp;  This extremely important area of Communications Engineering is covered in our learning tutorial&nbsp; $\rm LNTwww$&nbsp; in a separate book called&nbsp;  [[Channel_Coding|"Channel Coding"]].
  
$$C_{\rm BSEC}  = (1- \lambda) \cdot \left [ 1 - H_{\rm bin}(\frac{\varepsilon}{1- \lambda}) \right ]\hspace{0.05cm}.$$
+
[[File:EN_Inf_T_3_3_S7a.png|center|frame|Model for binary&ndash;coded communication]]
==Einige Grundlagen der Kanalcodierung ==
 
  
Um das Kanalcodierungstheorem richtig interpretieren zu können, sind einige Grundlagen der Kanalcodierung (englisch: ''Channel Coding'') erforderlich. Dieses äußerst wichtige Gebiet der Nachrichtentechnik wird in einem eigenen Buch [[Kanalcodierung]] behandelt. Die nachfolgende Beschreibung bezieht sich auf das stark vereinfachte Modell für binäre Blockcodes:
+
The following description refers to the highly simplified model for&nbsp; [[Channel_Coding/Allgemeine_Beschreibung_linearer_Blockcodes|$\text{binary block codes}$]]:
 +
*The infinitely long source symbol sequence&nbsp; $\underline{u}$&nbsp; (not shown here)&nbsp; is divided into blocks of&nbsp; $k$&nbsp; bits.&nbsp; We denote the information block with the serial number&nbsp; $j$&nbsp; by&nbsp; $\underline{u}_j^{(k)}$.
 +
*Each information block&nbsp; $\underline{u}_j^{(k)}$&nbsp; is converted into a code word&nbsp; $\underline{x}_j^{(n)}$&nbsp; by the channel encoder with a yellow background, where&nbsp; $n > k$&nbsp; is to apply.&nbsp; The ratio&nbsp; $R = k/n$&nbsp; is called the&nbsp; &raquo;'''code rate'''&laquo;.
 +
*The&nbsp; "Discrete Memoryless Channel"&nbsp; $\rm (DMC)$&nbsp;  is taken into account by transition probabilities&nbsp; $P_{Y\hspace{0.03cm}|\hspace{0.03cm}X}(⋅)$&nbsp;.&nbsp; This block with a green background causes errors at the bit level.&nbsp; The following can therefore apply:  &nbsp; $y_{j, \hspace{0.03cm}i} ≠ x_{j,\hspace{0.03cm} i}$.
 +
*Thus the received blocks&nbsp; $\underline{y}_j^{(n)}$&nbsp; consisting of &nbsp; $n$&nbsp; bits can also differ from the code words&nbsp; $\underline{x}_j^{(n)}$ .&nbsp; Likewise, the following generally applies to the blocks after the decoder:&nbsp;
 +
:$$\underline{v}_j^{(k)} ≠ \underline{u}_j^{(k)}.$$
  
[[File:P_ID2797__Inf_T_3_3_S7a.png|Modell für die codierte Informationsübertragung]]
 
  
Zu diesem Blockschaltbild ist anzumerken:
+
{{GraueBox|TEXT=
*Die unendlich lange Quellensymbolfolge $\underline{u}$ (hier nicht dargestellt) wird in Blöcke zu jeweils $k$ bit unterteilt. Wir bezeichnen den Informationsblock mit der laufenden Nummerierung $j$ mit $\underline{u}_j^{(k)}$.
+
$\text{Example 6}$:&nbsp;
*Jeder Informationsblock $j$ mit $\underline{u}_j^{(k)}$ wird durch den gelb hinterrlegten Kanalcoder in ein Codewort $\underline{x}_j^{(n)}$ umgesetzt, wobei $n > k$ gelten soll. Das Verhältnis $R = k/n$ bezeichnet man als die Coderate.
+
The diagram is intended to illustrate the nomenclature used here using the example of&nbsp;  $k = 3$ &nbsp;and&nbsp; $n = 4$.&nbsp; The first eight blocks of the information sequence&nbsp; $\underline{u}$&nbsp; and the&nbsp; encoded sequence $\underline{x}$ are shown.  
*Der Discrete Memoryless Channel (DMC) wird durch die Übergangswahrscheinlichkeit $P_{Y|X}(⋅)$ berücksichtigt. Dieser grün hinterlegte Block bewirkt Fehler auf Bitebene  ⇒  $y_{j, i} ≠ x_{j, i}$.
 
*Damit unterscheiden sich auch die aus $n$ Bit bestehenden Empfangsblöcke $\underline{y}_j^{(n)}$ von den Codeworten $\underline{x}_j^{(n)}$. Ebenso gilt im allgemeinen für die Blöcke nach dem Deoder: $\underline{v}_j^{(k)} ≠ \underline{u}_j^{(k)}$.
 
  
[[File:P_ID2798__Inf_T_3_3_S7b_neu.png|Zur Bitbezeichnung von Informationsblock und Codewort]]
+
[[File:EN_Inf_T_3_3_S7b_vers2.png|right|frame|Bit designation of information block and code word]]
 +
One can see the following assignment between the blocked and the unblocked description:
 +
*Bit 3 of the 1st information block &nbsp; ⇒ &nbsp; $u_{1,\hspace{0.08cm} 3}$&nbsp; corresponds to the symbol&nbsp; $u_3$&nbsp; in unblocked representation.
 +
*Bit 1 of the 2nd information block &nbsp; ⇒ &nbsp; $u_{2, \hspace{0.08cm}1}$&nbsp; corresponds to the symbol&nbsp; $u_4$&nbsp; in unblocked representation.
 +
*Bit 2 of the 6th information block &nbsp; ⇒ &nbsp; $u_{6, \hspace{0.08cm}2}$&nbsp; corresponds to the symbol&nbsp; $u_{17}$&nbsp; in unblocked representation.
 +
*Bit 4 of the 1st code word &nbsp; ⇒ &nbsp; $x_{1, \hspace{0.08cm}4}$&nbsp; corresponds to the symbol&nbsp; $x_4$&nbsp; in unblocked representation.
 +
*Bit 1 of the 2nd code word &nbsp; ⇒ &nbsp; $x_{2, \hspace{0.08cm}1}$&nbsp; corresponds to the symbol&nbsp; $x_5$&nbsp; in unblocked representation.
 +
*Bit 2 of the 6th code word &nbsp; ⇒ &nbsp; $x_{6, \hspace{0.08cm}2}$&nbsp; corresponds to the symbol&nbsp; $x_{22}$&nbsp; in unblocked representation.}}
  
Die Grafik soll die hier verwendete Nomenklatur am Beispiel $k$ = 3, $n$ = 4 verdeutlichen. Dargestellt sind die jeweils ersten acht Blöcke der Informationssequenz $\underline{u}$ und der Codesequenz $\underline{x}$. Man erkennt folgende Zuordnung zwischen der geblockten und der ungeblockten Beschreibung:
+
==Relationship between block errors and bit errors==
*Bit 3 des 1. Info–Blocks  ⇒  $u_{1, 3}$ entspricht dem Symbol u3 in ungeblockter Darstellung.
+
<br>
*Bit 1 des 2. Info–Blocks  ⇒  $u_{2, 1}$ entspricht dem Symbol $u_4$ in ungeblockter Darstellung.
+
To interpret the channel coding theorem, we still need various definitions for error probabilities.&nbsp; Various descriptive variables can be derived from the above system model:
*Bit 2 des 6. Info–Blocks  ⇒  $u_{6, 2}$ entspricht dem Symbol $u_{17}$ in ungeblockter Darstellung.
 
*Bit 4 des 1. Codewortes  ⇒  $x_{1, 4}$ entspricht dem Symbol $x_4$ in ungeblockter Darstellung.
 
*Bit 1 des 2. Codewortes  ⇒  $x_{2, 1}$ entspricht dem Symbol $x_5$ in ungeblockter Darstellung.
 
*Bit 2 des 6. Codewortes  ⇒  $x_{6, 2}$ entspricht dem Symbol $x_{22}$ in ungeblockter Darstellung.
 
  
Zur Interpretation des Kanalcodierungstheorems benötigen wir noch verschiedene Definitionen für „Fehlerwahrscheinlichkeiten”. Aus dem Systemmodell lassen sich folgende Größen ableiten:
+
{{BlaueBox|TEXT=
*Die '''Kanalfehlerwahrscheinlichkeit''' ergibt sich beim vorliegenden Kanalmodell zu
+
$\text{Definitions:}$
 +
*In the present channel model, the&nbsp; $\text{channel error probability}$&nbsp; is given by
 
   
 
   
$${\rm Pr(Kanalfehler)} = {\rm Pr} \left ({y}_{j,\hspace{0.05cm} i} \ne {x}_{j,\hspace{0.05cm} i}
+
:$$\text{Pr(channel error)} = {\rm Pr} \left ({y}_{j,\hspace{0.05cm} i} \ne {x}_{j,\hspace{0.05cm} i}
 
\right )\hspace{0.05cm}.$$
 
\right )\hspace{0.05cm}.$$
  
Beispielsweise ist beim BSC–Modell Pr(Kanalfehler) = $ε$  für alle $j$ = 1, 2, ... und 1 ≤ $i$ $n$.
+
:For example, in the BSC model&nbsp; $\text{Pr(channel error)} = ε$&nbsp; für alle&nbsp; $j = 1, 2$, ... &nbsp;and&nbsp; $1 ≤ i ≤ n$.
*Die '''Blockfehlerwahrscheinlichkeit''' bezieht sich auf die zugeordneten Informationsblöcke am Codereingang ⇒  $\underline{u}_j^{(k)}$ und am Decoderausgang $\underline{v}_j^{(k)}$, jeweils in Blöcken zu $k$ Bit:
+
*The&nbsp; $\text{block error probability}$&nbsp; refers to the allocated information blocks at the encoder input &nbsp; ⇒  &nbsp; $\underline{u}_j^{(k)}$&nbsp; and the decoder output  &nbsp;   &nbsp; $\underline{v}_j^{(k)}$,&nbsp; each in blocks of&nbsp; $k$&nbsp; bits:
 
   
 
   
$${\rm Pr(Blockfehler)} = {\rm Pr} \left (\underline{\upsilon}_j^{(k)} \ne \underline{u}_j^{(k)}
+
:$$\text{Pr(block error)} = {\rm Pr} \left (\underline{\upsilon}_j^{(k)} \ne \underline{u}_j^{(k)}
 
\right )\hspace{0.05cm}.$$
 
\right )\hspace{0.05cm}.$$
  
*Die '''Bitfehlerwahrscheinlichkeit''' bezieht sich ebenfalls auf den Eingang und den Ausgang des betrachteten Codiersystems, allerdings auf Bitebene:
+
*The&nbsp; $\text{bit error probability}$&nbsp; also refers to the input and the output of the entire coding system under consideration, but at the bit level:
 
   
 
   
$${\rm Pr(Bitfehler)} = {\rm Pr} \left ({\upsilon}_{j,\hspace{0.05cm} i} \ne {u}_{j,\hspace{0.05cm} i}
+
:$$\text{Pr(bit error)} = {\rm Pr} \left ({\upsilon}_{j,\hspace{0.05cm} i} \ne {u}_{j,\hspace{0.05cm} i}
 
\right )\hspace{0.05cm}.$$
 
\right )\hspace{0.05cm}.$$
  
Hierbei ist vereinfachend vorausgesetzt, dass alle $k$ Bit $u_{j,i}$ des Informationsblockes $j$ mit gleicher Wahrscheinlichkeit verfälscht werden (1 ≤ $i$ $k$). Andernfalls müsste über die $k$ Bit gemittelt werden.
+
:For simplicity, it is assumed here that all&nbsp; $k$&nbsp; bits&nbsp; $u_{j,\hspace{0.08cm}i}$&nbsp;  $(1 ≤ i ≤ k)$&nbsp; of the information block&nbsp; $j$&nbsp; are falsified with equal probability.  
 +
:Otherwise, the&nbsp; $k$&nbsp; bits would have to be averaged.}}
  
  
Zwischen Blockfehler– und Bitfehlerwahrscheinlichkeit besteht allgemein der Zusammenhang:
+
There is a general relationship between the block error probability and the bit error probability:
 
   
 
   
$${1}/{k} \cdot {\rm Pr(Blockfehler)} \le {\rm Pr(Bitfehler)} \le {\rm Pr(Blockfehler)}  
+
:$${1}/{k} \cdot \text{Pr(block error)} \le \text{Pr(bit error)} \le \text{Pr(block error)}  
 
  \hspace{0.05cm}.$$
 
  \hspace{0.05cm}.$$
  
*Die untere Schranke ergibt sich, wenn bei allen fehlerhaften Blöcken alle Bit falsch sind.
+
*The lower bound results when all bits are wrong in all faulty blocks.
*Gibt es in jedem fehlerhaften Block genau nur einen einzigen Bitfehler, dann ist die Bitfehlerwahrscheinlichkeit Pr(Bitfehler) identisch mit der Blockfehlerwahrscheinlichkeit Pr(Blockfehler).
+
*If there is exactly only one bit error in each faulty block, then the bit error probability  is identical to the block error probability:
 +
:$$ \text{Pr(bit error)} \equiv \text{Pr(block error)}  \hspace{0.05cm}.$$
 +
 
 +
[[File:EN_Inf_T_3_3_S7c.png|frame|Definition of different error probabilities]]
 +
{{GraueBox|TEXT=
 +
$\text{Example 7:}$&nbsp; The upper graph shows the first eight  received blocks&nbsp; $\underline{y}_j^{(n)}$&nbsp; with&nbsp; $n = 4$&nbsp; bits each.&nbsp;  Here channel errors are shaded green.  
  
 +
Below, the initial sequence&nbsp; $\underline{v}$&nbsp; after decoding is sketched, divided into blocks&nbsp; $\underline{v}_j^{(k)}$&nbsp; with&nbsp; $k = 3$&nbsp; bits each. Note:
  
{{Beispiel}}
+
*Bit errors are shaded red in the lower diagram.
Die Grafik zeigt oben die ersten acht Empfangsblöcke $\underline{y}_j^{(n)}$ mit $n$ = 4. Kanalfehler sind grün schraffiert. Unten ist die Ausgangsfolge $\underline{v}$ skizziert, unterteilt in Blöcke $\underline{v}_j^{(k)}$ zu je $k$ = 3 Bit:
+
*Block errors can be recognized by the blue frame.
*Bitfehler sind im unteren Diagramm rot schraffiert.
 
*Blockfehler erkennt man an der blauen Umrahmung.
 
  
[[File:P_ID2823__Inf_T_3_3_S7c_neu.png|Zur Definition verschiedener Fehlerwahrscheinlichkeiten]]
 
  
Hierzu einige (aufgrund der kurzen Folge) vage Angaben zu den Fehlerwahrscheinlichkeiten:
+
For this, some&nbsp; (due to the short sequence)&nbsp; only very vague information about the error probabilities:
*Die Hälfte der Empfangsbits sind grün schraffiert. Daraus folgt:
+
*Half of the received bits are shaded green.&nbsp; From this follows&nbsp;
   
+
:$$\text{Pr(channel error)} = 16/32 = 1/2.$$
$${\rm Pr(Kanalfehler)} = 16/32 = 1/2.$$
 
  
*Die Bitfehlerwahrscheinlichkeit lautet mit der beispielhaften Codierung & Decodierung:
+
*The bit error probability with the exemplary encoding and decoding is:  &nbsp;
+
:$$\text{Pr(bit error)} = 8/24 = 1/3.$$
$${\rm Pr(Bitfehler)} = 8/24 = 1/3.$$
 
  
*Dagegen würde bei uncodierter Übertragung gelten:
+
*In contrast, with uncoded transmission would be: &nbsp;
+
:$$\text{Pr(bit error)} = \text{Pr(channel error)}  = 1/2.$$
$${\rm Pr(Bitfehler)} = {\rm Pr(Kanalfehler)}  = 1/2.$$
 
  
*Die Hälfte der decodierten Blöcke sind blau umrandet. Daraus folgt:
+
*Half of the decoded blocks are outlined in blue. From this follows:   &nbsp;
+
:$$\text{Pr(block error)} = 4/8 = 1/2.$$
$${\rm Pr(Blockfehler)} = 4/8 = 1/2.$$
 
  
Mit Pr(Blockfehler) = 1/2 und k = 3 liegt die Bitfehlerwahrscheinlichkeit in folgendem Bereich:
+
*With &nbsp;$\text{Pr(block error)}= 1/2$&nbsp; and &nbsp;$k = 3$&nbsp; the bit error probability is in the following range:   &nbsp;
+
:$$1/6  \le \text{Pr(bit error)} \le 1/2  
$$1/6  \le {\rm Pr(Bitfehler)} \le 1/2  
 
 
  \hspace{0.05cm}.$$
 
  \hspace{0.05cm}.$$
  
*Die obere Schranke ergibt sich, wenn in jedem der vier verfälschten Blöcke alle Bit falsch sind: Pr(Bitfehler) = 12/24 = 1/2.
+
#The upper bound with respect to bit errors is obtained when all bits in each of the four falsified blocks are wrong:   &nbsp;  $\text{Pr(bit error)} = 12/24 = 1/2.$
*Die untere Schranke beschreibt den Fall, dass in jedem der vier verfälschten Blöcke jeweils nur ein Bit falsch ist:    Pr(Bitfehler) = 4/24 = 1/6.
+
#The lower bound indicates that only one bit is wrong in each of the four falsified blocks: &nbsp;   $\text{Pr(bit error)} = 4/24 = 1/6$.}}
 +
 
  
{{end}}
+
==Rate, channel capacity and bit error probability== 
 +
<br>
 +
{{BlaueBox|TEXT=
 +
$\text{Basically:}$&nbsp;
 +
*Channel coding increases the reliability of data transmission from the source to the sink.
 +
*If the code rate&nbsp; $R = k/n$&nbsp; is reduced and the added redundancy&nbsp; $(1 - R)$&nbsp; is increased, the data reliability is generally improved and the bit error probability is reduced, which we will refer to as&nbsp; $p_{\rm B}$&nbsp; in the following:
 +
 +
:$$p_{\rm B} = \text{Pr(bit error)} = {\rm Pr} \left ({\upsilon}_{j,\hspace{0.05cm} i} \ne {u}_{j,\hspace{0.05cm} i}
 +
\right )\hspace{0.05cm}.$$}}
  
  
==Rate, Kanalkapazität und Bitfehlerwahrscheinlichkeit==
+
The following theorem is based on the&nbsp; "Data Processing Theorem"&nbsp; and&nbsp; "Fano's Lemma".&nbsp; The derivation can be found in standard works on information theory, for example in&nbsp; [CT06]<ref name="CT06" />:
  
Durch Kanalcodierung wird die Zuverlässigkeit (englisch: ''Reliability'') der Datenübertragung von der Quelle zur Sinke erhöht. Vermindert man die Coderate $R = k/n$ und erhöht so die hinzugefügte Redundanz (1 – $R$), so wird im allgemeinen die Datensicherheit verbessert und damit die Bitfehlerwahrscheinlichkeit herabgesetzt, die wir im Weiteren kurz $p_B$ nennen:
+
{{BlaueBox|TEXT=
 +
$\text{Inversion of Shannon's Channel Coding Theorem:}$&nbsp;
 
   
 
   
$$p_{\rm B} = {\rm Pr(Bitfehler)} = {\rm Pr} \left ({\upsilon}_{j,\hspace{0.05cm} i} \ne {u}_{j,\hspace{0.05cm} i}
+
If one uses a channel with too small a capacity&nbsp; $R$&nbsp; for data transmission at rate&nbsp; $C < R$, the bit error probability&nbsp; $p_{\rm B}$&nbsp;  cannot fall below a lower bound even with the best possible channel coding:
\right )\hspace{0.05cm}.$$
 
 
 
Das folgende Theorem basiert auf dem Data Processing Theorem und ''Fano's Lemma''. Die Herleitung kann in den Standardwerken zur Informationstheorie nachgelesen werden, zum Beispiel in <ref>Cover, T.M.; Thomas, J.A.: ''Elements of Information Theory''. West Sussex: John Wiley & Sons, 2nd Edition, 2006.</ref>:
 
 
 
{{Definition}}
 
'''Umkehrung des Shannonschen Kanalcodierungstheorems''':
 
Benutzt man zur Datenübertragung mit Rate $R$ einen Kanal mit unzureichender Kanalkapazität $C < R$, so kann auch bei bestmöglicher Kanalcodierung die Bitfehlerwahrscheinlichkeit $p_B$ eine untere Schranke nicht unterschreiten:
 
 
   
 
   
$$p_{\rm B} \ge H_{\rm bin}^{-1} \cdot \left ( 1 - {C}/{R}\right ) > 0\hspace{0.05cm}.$$
+
:$$p_{\rm B} \ge H_{\rm bin}^{-1} \cdot \left ( 1 - {C}/{R}\right ) > 0\hspace{0.05cm}.$$
  
$H_{\rm bin}(⋅)$ bezeichnet hierbei die binäre Entropiefunktion.
+
Here&nbsp; $H_{\rm bin}(⋅)$&nbsp; denotes the&nbsp; [[Information_Theory/Discrete_Memoryless_Sources#Binary_entropy_function|$\text{binary entropy function}$]].}}
{{end}}
 
  
  
Da die Wahrscheinlichkeit der Blockfehler nie kleiner sein kann als die der Bitfehler, ist für $R > C$ auch die Blockfehlerwahrscheinlichkeit „0” nicht möglich. Aus dem angegebenen Bereich für die Bitfehler,
+
Since the probability of the block errors can never be smaller than that of the bit errors,&nbsp; the block error probability "zero" is also not possible for&nbsp; $R > C$&nbsp;.&nbsp; <br>From the given bounds for the bit errors,
 
   
 
   
$${1}/{k} \cdot {\rm Pr}({\rm Blockfehler}) \le  {\rm Pr}({\rm Bitfehler}) \le  {\rm Pr}({\rm Blockfehler})\hspace{0.05cm},$$
+
:$$ {1}/{k} \cdot {\rm Pr}({\rm block\ error}) \le  {\rm Pr}({\rm bit\ error}) \le  {\rm Pr}({\rm block\ error})\hspace{0.05cm},$$
  
lässt sich auch ein Bereich für die Blockfehlerwahrscheinlichkeit angeben:
+
a range for the block error probability can also be given:
 
   
 
   
$$ {\rm Pr}({\rm Bitfehler})  \le  {\rm Pr}({\rm Blockfehler}) \le  k \cdot  {\rm Pr}({\rm Bitfehler})\hspace{0.05cm}.$$
+
:$$ {\rm Pr}({\rm bit\ error})  \le  {\rm Pr}({\rm block\ error}) \le  k \cdot  {\rm Pr}({\rm bit\ error})\hspace{0.05cm}.$$
  
{{Beispiel}}
+
{{GraueBox|TEXT=
Verwendet man einen Kanal mit der Kapazität $C$ = 1/3 (bit) zur Datenübertragung mit der Coderate $R$ < 1/3, so ist prinzipiell die Bitfehlerwahrscheinlichkeit $p_B$ = 0 möglich.
+
$\text{Example 8:}$&nbsp; For a channel with capacity&nbsp; $C = 1/3$&nbsp; (bit), error-free data transmission &nbsp; $(p_{\rm B} = 0)$&nbsp; with code rate&nbsp; $R < 1/3$&nbsp; is possible in principle.
*Allerdings ist aus dem Kanalcodierungstheorem der spezielle ( $k$, $n$ )–Blockcode nicht bekannt, der dieses Wunschergebnis ermöglicht. Shannon macht hierzu keine Aussagen.
+
*However, from the channel coding theorem the special&nbsp; $(k$,&nbsp;  $n)$–block code is not known which makes this desired result possible. &nbsp;  Shannon also makes no statements on this.
*Bekannt ist nur, dass ein solcher bestmöglicher Code mit unendlich langen Blöcken arbeitet. Bei gegebener Coderate $R$ = $k/n$ gilt somit sowohl $k → ∞$ als auch $n → ∞$.
+
*All that is known is that such a best possible code works with blocks of infinite length.&nbsp; For a given code rate&nbsp; $R = k/n$&nbsp; both&nbsp; $k → ∞$&nbsp; and&nbsp; $n → ∞$ thus apply.
*Deshalb ist die Aussage „Die Bitfehlerwahrscheinlichkeit ist 0” nicht identisch mit „Es treten keine Bitfehler auf”: Auch bei endlich vielen Bitfehlern und $k → ∞$ gilt $p_B$ = 0.
+
*The statement&nbsp; "The bit error probability is zero"&nbsp; is not the same as&nbsp; "No bit errors occur": &nbsp; Even with a finite number of bit errors and&nbsp; $k → ∞$&nbsp; &rArr;  &nbsp; $p_{\rm B} = 0$.
  
  
Mit der Coderate $R$ = 1 (uncodierte Übertragung) erhält man:
+
With the code rate&nbsp; $R = 1 > C$&nbsp; (uncoded transmission) one obtains:
 
   
 
   
$$p_{\rm B} \ge H_{\rm bin}^{-1} \cdot \left ( 1 - \frac{1/3}{1.0}\right )  
+
:$$p_{\rm B} \ge H_{\rm bin}^{-1} \cdot \left ( 1 - \frac{1/3}{1}\right )  
 
= H_{\rm bin}^{-1}(2/3) \approx 0.174  
 
= H_{\rm bin}^{-1}(2/3) \approx 0.174  
 
> 0\hspace{0.05cm}.$$
 
> 0\hspace{0.05cm}.$$
  
Mit der Coderate $R$ = 1/2 > $C$ ist die Bitfehlerwahrscheinlichkeit zwar kleiner, aber nicht 0:
+
With the code rate&nbsp; $R = 1/2 > C$&nbsp;, the bit error probability is smaller&nbsp; but also different from zero:
 
   
 
   
$$p_{\rm B} \ge H_{\rm bin}^{-1} \cdot \left ( 1 - \frac{1/3}{1/2}\right )  
+
:$$p_{\rm B} \ge H_{\rm bin}^{-1} \cdot \left ( 1 - \frac{1/3}{1/2}\right )  
 
= H_{\rm bin}^{-1}(1/3) \approx 0.062  
 
= H_{\rm bin}^{-1}(1/3) \approx 0.062  
> 0\hspace{0.05cm}.$$
+
> 0\hspace{0.05cm}.$$}}
 +
 
 +
 +
==Exercises for the chapter==
 +
<br>
 +
[[Aufgaben:Exercise_3.10:_Mutual_Information_at_the_BSC|Exercise 3.10: Mutual Information at the BSC]]
 +
 
 +
[[Aufgaben:Exercise_3.10Z:_BSC_Channel_Capacity|Exercise 3.10Z: BSC Channel Capacity]]
 +
 
 +
[[Aufgaben:Exercise_3.11:_Erasure_Channel|Exercise 3.11: Erasure Channel]]
 +
 
 +
[[Aufgaben:Exercise_3.11Z:_Extremely_Asymmetrical_Channel|Exercise 3.11Z: Extremely Asymmetrical Channel]]
 +
 
 +
[[Aufgaben:Exercise_3.12:_Strictly_Symmetrical_Channels|Exercise 3.12: Strictly Symmetrical Channels]]
 +
 
 +
[[Aufgaben:Exercise_3.13:_Code_Rate_and_Reliability|Exercise 3.13: Code Rate and Reliability]]
 +
 
 +
[[Aufgaben:Exercise_3.14:_Channel_Coding_Theorem|Exercise 3.14: Channel Coding Theorem]]
 +
 
 +
[[Aufgaben:Exercise_3.15:_Data_Processing_Theorem|Exercise 3.15: Data Processing Theorem]]
  
Aufgabenhinweis: A3.12: Coderate und Zuverlässigkeit – A3.13: Kanalcodierungstheorem
 
  
{{end}}
+
 
==Aufgaben zu Kapitel 3.3 ==  
+
==References==
==Quellenverzeichnis==
 
 
<references/>
 
<references/>
 
{{Display}}
 
{{Display}}

Latest revision as of 15:10, 28 February 2023

Information-theoretical model of digital signal transmission


The entropies defined so far in general terms are now applied to digital signal transmission, whereby we assume a discrete memoryless channel  $\rm (DMC)$  according to the graphic:

Digital signal transmission model under consideration
  • The set of source symbols is characterized by the discrete random variable  $X$ , where  $|X|$  indicates the source symbol set size:
$$X = \{\hspace{0.05cm}x_1\hspace{0.05cm}, \hspace{0.15cm} x_2\hspace{0.05cm},\hspace{0.15cm} \text{...}\hspace{0.1cm} ,\hspace{0.15cm} x_{\mu}\hspace{0.05cm}, \hspace{0.15cm}\text{...}\hspace{0.1cm} , \hspace{0.15cm} x_{|X|}\hspace{0.05cm}\}\hspace{0.05cm}.$$
  • Correspondingly,  $Y$  characterizes the set of sink symbols with the symbol set size  $|Y|$:
$$Y = \{\hspace{0.05cm}y_1\hspace{0.05cm}, \hspace{0.15cm} y_2\hspace{0.05cm},\hspace{0.15cm} \text{...}\hspace{0.1cm} ,\hspace{0.15cm} y_{\kappa}\hspace{0.05cm}, \hspace{0.15cm}\text{...}\hspace{0.1cm} , \hspace{0.15cm} Y_{|Y|}\hspace{0.05cm}\}\hspace{0.05cm}.$$
$$X = \{0,\ 1\},\hspace{0.5cm} Y = \{0,\ 1,\ \ \text{E}\}\ ⇒ \ |X| = 2, \ |Y| = 3.$$
  • The sink symbol  $\rm E$  indicates an  "erasure".  The event  $Y=\text{E}$  indicates that a decision for  $0$  or for  $1$  would be too uncertain.
  • The symbol probabilities of the source and sink are accounted for in the graph by the probability mass functions  $P_X(X)$  and  $P_Y(Y)$:
$$P_X(x_{\mu}) = {\rm Pr}( X = x_{\mu})\hspace{0.05cm}, \hspace{0.3cm} P_Y(y_{\kappa}) = {\rm Pr}( Y = y_{\kappa})\hspace{0.05cm}.$$
  • Let it hold:  The prtobability mass functions  $P_X(X)$  and  $P_Y(Y)$  contain no zeros   ⇒   $\text{supp}(P_X) = P_X$  and  $\text{supp}(P_Y) = P_Y$.  This prerequisite facilitates the description without loss of generality.
  • All transition probabilities of the discrete memoryless channel   $\rm (DMC)$  are captured by the  conditional probability function  $P_{Y|X}(Y|X)$. . With  $x_μ ∈ X$  and  $y_κ ∈ Y$,  the following definition applies to this:
$$P_{Y\hspace{0.01cm}|\hspace{0.01cm}X}(y_{\kappa}\hspace{0.01cm} |\hspace{0.01cm} x_{\mu}) = {\rm Pr}(Y\hspace{-0.1cm} = y_{\kappa}\hspace{0.03cm} | \hspace{0.03cm}X \hspace{-0.1cm}= x_{\mu})\hspace{0.05cm}.$$


The green block in the graph marks  $P_{Y|X}(⋅)$  with  $|X|$  inputs and  $|Y|$  outputs.  Blue connections mark transition probabilities  $\text{Pr}(y_i | x_μ)$  starting from  $x_μ$  with  $1 ≤ i ≤ |Y|$,  while all red connections end at  $y_κ$:    $\text{Pr}(y_κ | x_i)$  with  $1 ≤ i ≤ |X|$.

Before we give the entropies for the individual probability functions, viz.

$$P_X(X) ⇒ H(X),\hspace{0.5cm} P_Y(Y) ⇒ H(Y), \hspace{0.5cm} P_{XY}(X) ⇒ H(XY), \hspace{0.5cm} P_{Y|X}(Y|X) ⇒ H(Y|X),\hspace{0.5cm} P_{X|Y}(X|Y) ⇒ H(X|Y),$$

the above statements are to be illustrated by an example.

Channel model  "Binary Erasure Channel"  $\rm (BEC)$

$\text{Example 1}$:  In the book  "Channel Coding"  we also deal with the  $\text{Binary Erasure Channel}$  $\rm (BEC)$, which is sketched in a somewhat modified form on the right.   The following prerequisites apply:

  • Let the input alphabet be binary  ⇒   $X = \{0,\ 1 \}$   ⇒   $\vert X\vert = 2$  while three values are possible at the output   ⇒   $Y = \{0,\ 1,\ \text{E} \}$   ⇒   $\vert Y\vert = 3$.
  • The symbol  $\text{E}$  indicates the case that the receiver cannot decide for one of the binary symbols  $0$  or  $1$  due to too much channel interference.  "E"  stands for erasure.
  • With the  $\rm BEC$  according to the above sketch, both a transmitted  $0$  and a  $1$  are erased with probability  $λ$  while the probability of a correct transmission is  $1 – λ$  in each case.
  • In contrast, transmission errors are excluded by the BEC model  
    ⇒   the conditional probabilities  $\text{Pr}(Y = 1 \vert X = 0)$  and  $\text{Pr}(Y = 0 \vert X = 1)$  are both zero.


At the transmitter, the  "zeros"  and  "ones"  would not necessarily be equally probable.  Rather, we use the probability mass functions

$$\begin{align*}P_X(X) & = \big ( {\rm Pr}( X = 0)\hspace{0.05cm},\hspace{0.2cm} {\rm Pr}( X = 1) \big )\hspace{0.05cm},\\ P_Y(Y) & = \big ( {\rm Pr}( Y = 0)\hspace{0.05cm},\hspace{0.2cm} {\rm Pr}( Y = 1)\hspace{0.05cm},\hspace{0.2cm} {\rm Pr}( Y = {\rm E}) \big )\hspace{0.05cm}.\end{align*}$$

From the above model we then get:

$$\begin{align*}P_Y(0) & = {\rm Pr}( Y \hspace{-0.1cm} = 0) = P_X(0) \cdot ( 1 - \lambda)\hspace{0.05cm}, \\ P_Y(1) & = {\rm Pr}( Y \hspace{-0.1cm} = 1) = P_X(1) \cdot ( 1 - \lambda)\hspace{0.05cm}, \\ P_Y({\rm E}) & = {\rm Pr}( Y \hspace{-0.1cm} = {\rm E}) = P_X(0) \cdot \lambda \hspace{0.1cm}+\hspace{0.1cm} P_X(1) \cdot \lambda \hspace{0.05cm}.\end{align*}$$

If we now take  $P_X(X)$  and  $P_Y(Y)$  to be vectors, the result can be represented as follows:

$$P_{\hspace{0.05cm}Y}(Y) = P_X(X) \cdot P_{\hspace{0.05cm}Y\hspace{-0.01cm}\vert \hspace{-0.01cm}X}(Y\hspace{-0.01cm} \vert \hspace{-0.01cm} X) \hspace{0.05cm},$$

where the transition probabilities  $\text{Pr}(y_κ\vert x_μ)$  are accounted for by the following matrix:

$$P_{\hspace{0.05cm}Y\hspace{-0.01cm} \vert \hspace{-0.01cm}X}(Y\hspace{-0.01cm} \vert \hspace{-0.01cm} X) = \begin{pmatrix} 1 - \lambda & 0 & \lambda\\ 0 & 1 - \lambda & \lambda \end{pmatrix}\hspace{0.05cm}.$$

Note:

  • We have chosen this representation only to simplify the description.
  • $P_X(X)$  and  $P_Y(Y)$  are not vectors in the true sense and  $P_{Y \vert X}(Y\vert X)$  is not a matrix either.


Directional diagram for digital signal transmission


All entropies defined in the  "last chapter"  also apply to digital signal transmission.  However, it is expedient to choose the right-hand diagram instead of the diagram used so far, corresponding to the left-hand diagram, in which the direction from source  $X$  to sink  $Y$  is recognizable.

Two information-theoretical models for digital signal transmission

Let us now interpret the right graph starting from the general  $\text{DMC model}$:

  • The  »source entropy«  $H(X)$  denotes the average information content of the source symbol sequence.   With the symbol set size  $|X|$  applies:
$$H(X) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{1}{P_X(X)}\right ] \hspace{0.1cm} = -{\rm E} \big [ {\rm log}_2 \hspace{0.1cm}{P_X(X)}\big ] \hspace{0.2cm} =\hspace{0.2cm} \sum_{\mu = 1}^{|X|} P_X(x_{\mu}) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{P_X(x_{\mu})} \hspace{0.05cm}.$$
  • The  »equivocation«  $H(X|Y)$  indicates the average information content that an observer who knows exactly about the sink  $Y$  gains by observing the source  $X$ :
$$H(X|Y) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}X\hspace{-0.01cm}|\hspace{-0.01cm}Y}(X\hspace{-0.01cm} |\hspace{0.03cm} Y)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{|X|} \sum_{\kappa = 1}^{|Y|} P_{XY}(x_{\mu},\hspace{0.05cm}y_{\kappa}) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}X\hspace{-0.01cm}|\hspace{0.03cm}Y} (\hspace{0.05cm}x_{\mu}\hspace{0.03cm} |\hspace{0.05cm} y_{\kappa})} \hspace{0.05cm}.$$
  • The equivocation is the portion of the source entropy  $H(X)$  that is lost due to channel interference  (for digital channel: transmission errors).  The  »mutual information«  $I(X; Y)$  remains, which reaches the sink:
$$I(X;Y) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_{XY}(X, Y)}{P_X(X) \cdot P_Y(Y)}\right ] \hspace{0.2cm} = H(X) - H(X|Y) \hspace{0.05cm}.$$
  • The  »irrelevance«  $H(Y|X)$  indicates the average information content that an observer who knows exactly about the source  $X$  gains by observing the sink  $Y$:
$$H(Y|X) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{-0.01cm}X}(Y\hspace{-0.01cm} |\hspace{0.03cm} X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{|X|} \sum_{\kappa = 1}^{|Y|} P_{XY}(x_{\mu},\hspace{0.05cm}y_{\kappa}) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{0.03cm}X} (\hspace{0.05cm}y_{\kappa}\hspace{0.03cm} |\hspace{0.05cm} x_{\mu})} \hspace{0.05cm}.$$
  • The  »sink entropy«  $H(Y)$, the mean information content of the sink.  $H(Y)$  is the sum of the useful mutual information  $I(X; Y)$  and the useless irrelevance  $H(Y|X)$, which comes exclusively from channel errors:
$$H(Y) = {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{1}{P_Y(Y)}\right ] \hspace{0.1cm} = -{\rm E} \big [ {\rm log}_2 \hspace{0.1cm}{P_Y(Y)}\big ] \hspace{0.2cm} =I(X;Y) + H(Y|X) \hspace{0.05cm}.$$

Calculation of the mutual information for the binary channel


These definitions will now be illustrated by an example.   We deliberately avoid simplifying the calculations by exploiting symmetries.

General model of the binary channel

$\text{Example 2}$:  We consider the general binary channel without memory according to the sketch.  Let the falsification probabilities be:

$$\begin{align*}\varepsilon_0 & = {\rm Pr}(Y\hspace{-0.1cm} = 1\hspace{0.05cm}\vert X \hspace{-0.1cm}= 0) = 0.01\hspace{0.05cm},\\ \varepsilon_1 & = {\rm Pr}(Y\hspace{-0.1cm} = 0\hspace{0.05cm} \vert X \hspace{-0.1cm}= 1) = 0.2\hspace{0.05cm}\end{align*}$$
$$\Rightarrow \hspace{0.3cm} P_{\hspace{0.05cm}Y\hspace{-0.01cm} \vert \hspace{-0.01cm}X}(Y\hspace{-0.01cm} \vert \hspace{-0.01cm} X) = \begin{pmatrix} 1 - \varepsilon_0 & \varepsilon_0\\ \varepsilon_1 & 1 - \varepsilon_1 \end{pmatrix} = \begin{pmatrix} 0.99 & 0.01\\ 0.2 & 0.8 \end{pmatrix} \hspace{0.05cm}.$$

Furthermore, we assume source symbols that are not equally probable:

$$P_X(X) = \big ( p_0,\ p_1 \big )= \big ( 0.1,\ 0.9 \big ) \hspace{0.05cm}.$$

With the  $\text{binary entropy function}$  $H_{\rm bin}(p)$,  we thus obtain for the source entropy:

$$H(X) = H_{\rm bin} (0.1) = 0.4690 \hspace{0.12cm}{\rm bit} \hspace{0.05cm}.$$

For the probability mass function of the sink as well as for the sink entropy we thus obtain:

$$P_Y(Y) = \big [ {\rm Pr}( Y\hspace{-0.1cm} = 0)\hspace{0.05cm}, \ {\rm Pr}( Y \hspace{-0.1cm}= 1) \big ] = \big ( p_0\hspace{0.05cm},\ p_1 \big ) \cdot \begin{pmatrix} 1 - \varepsilon_0 & \varepsilon_0\\ \varepsilon_1 & 1 - \varepsilon_1 \end{pmatrix} $$
$$\begin{align*}\Rightarrow \hspace{0.3cm} {\rm Pr}( Y \hspace{-0.1cm}= 0)& = p_0 \cdot (1 - \varepsilon_0) + p_1 \cdot \varepsilon_1 = 0.1 \cdot 0.99 + 0.9 \cdot 0.2 = 0.279\hspace{0.05cm},\\ {\rm Pr}( Y \hspace{-0.1cm}= 1) & = 1 - {\rm Pr}( Y \hspace{-0.1cm}= 0) = 0.721\end{align*}$$
$$\Rightarrow \hspace{0.3cm} H(Y) = H_{\rm bin} (0.279) = 0.8541 \hspace{0.12cm}{\rm bit} \hspace{0.05cm}. $$

The joint probabilities  $p_{\mu \kappa} = \text{Pr}\big[(X = μ) ∩ (Y = κ)\big]$  between source and sink are:

$$\begin{align*}p_{00} & = p_0 \cdot (1 - \varepsilon_0) = 0.099\hspace{0.05cm},\hspace{0.5cm}p_{01}= p_0 \cdot \varepsilon_0 = 0.001\hspace{0.05cm},\\ p_{10} & = p_1 \cdot (1 - \varepsilon_1) = 0.180\hspace{0.05cm},\hspace{0.5cm}p_{11}= p_1 \cdot \varepsilon_1 = 0.720\hspace{0.05cm}.\end{align*}$$

From this one obtains for

  • the  »joint entropy«:
$$H(XY) = p_{00}\hspace{-0.05cm} \cdot \hspace{-0.05cm}{\rm log}_2 \hspace{0.05cm} \frac{1}{p_{00} \rm } + p_{01} \hspace{-0.05cm} \cdot \hspace{-0.05cm}{\rm log}_2 \hspace{0.05cm} \frac{1}{p_{01} \rm } + p_{10}\hspace{-0.05cm} \cdot \hspace{-0.05cm} {\rm log}_2 \hspace{0.05cm} \frac{1}{p_{10} \rm } + p_{11} \hspace{-0.05cm} \cdot \hspace{-0.05cm} {\rm log}_2\hspace{0.05cm} \frac{1}{p_{11}\rm } = 1.1268\,{\rm bit} \hspace{0.05cm},$$
  • the  »mutual information«:
$$I(X;Y) = H(X) + H(Y) - H(XY) = 0.4690 + 0.8541 - 1.1268 = 0.1963\hspace{0.12cm}{\rm bit} \hspace{0.05cm},$$
Information theoretic model of the binary channel under consideration
  • the  »equivocation«:
$$H(X \vert Y) \hspace{-0.01cm} =\hspace{-0.01cm} H(X) \hspace{-0.01cm} -\hspace{-0.01cm} I(X;Y) \hspace{-0.01cm} $$
$$\Rightarrow \hspace{0.3cm} H(X \vert Y) \hspace{-0.01cm} = \hspace{-0.01cm} 0.4690\hspace{-0.01cm} -\hspace{-0.01cm} 0.1963\hspace{-0.01cm} =\hspace{-0.01cm} 0.2727\hspace{0.12cm}{\rm bit} \hspace{0.05cm},$$
  • the »irrelevance«:
$$H(Y \vert X) = H(Y) - I(X;Y) $$
$$\Rightarrow \hspace{0.3cm} H(Y \vert X) = 0.8541 - 0.1963 = 0.6578\hspace{0.12cm}{\rm bit} \hspace{0.05cm}.$$

The results are summarized in the graph.


Notes:

  • The equivocation and irrelevance could also have been calculated directly (but with extra effort) from the corresponding probability functions.
  • For example, the irrelevance:
$$H(Y|X) = \hspace{-0.2cm} \sum_{(x, y) \hspace{0.05cm}\in \hspace{0.05cm}XY} \hspace{-0.2cm} P_{XY}(x,\hspace{0.05cm}y) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{0.03cm}X} (\hspace{0.05cm}y\hspace{0.03cm} |\hspace{0.05cm} x)}= p_{00} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{1\hspace{-0.08cm} - \hspace{-0.08cm}\varepsilon_0} + p_{01} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{\varepsilon_0} + p_{10} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{1\hspace{-0.08cm} - \hspace{-0.08cm}\varepsilon_1} + p_{11} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{\varepsilon_1} = 0.6578\,{\rm bit} \hspace{0.05cm}.$$

Definition and meaning of channel capacity


We further consider a discrete memoryless channel  $\rm (DMC)$  with a finite number of source symbols   ⇒   $|X|$  and also only finitely many sink symbols   ⇒   $|Y|$,  as shown in the first section of this chapter.

  • If one calculates the mutual information  $I(X, Y)$  as explained in  $\text{Example 2}$,  it also depends on the source statistic   ⇒   $P_X(X)$.
  • Ergo:   The mutual information  $I(X, Y)$  is not a pure channel characteristic.


$\text{Definition:}$  The  »channel capacity«  introduced by  $\text{Claude E. Shannon}$  according to his standard work  [Sha48][1] reads:

$$C = \max_{P_X(X)} \hspace{0.15cm} I(X;Y) \hspace{0.05cm}.$$

The additional unit  "bit/use"  is often added.  Since according to this definition the best possible source statistics are always the basis:

  • $C$  depends only on the channel properties   ⇒   $P_{Y \vert X}(Y \vert X)$,
  • but not on the source statistics   ⇒   $P_X(X)$. 


Shannon needed the quantity  $C$  to formulate the  "Channel Coding Theorem" – one of the highlights of the information theory he founded.

$\text{Shannon's Channel Coding Theorem: }$ 

  • For every transmission channel with channel capacity  $C > 0$,  there exists (at least) one  $(k,\ n)$–block code,  whose (block) error probability approaches zero  as long as the code rate  $R = k/n$  is less than or equal to the channel capacity:  
$$R ≤ C.$$
  • The prerequisite for this, however,  is that the following applies to the block length of this code:   $n → ∞.$


The proof of this theorem,  which is beyond the scope of our learning tutorial,  can be found for example in  [CT06][2],  [Kra13][3]  and  [Meck09][4].

As will be shown in  "Exercise 3.13",  the reverse is also true.  This proof can also be found in the literature references just mentioned.

$\text{Reverse of Shannon's channel coding theorem: }$ 

If the rate  $R$  of the  $(n,\ k)$–block code used is greater than the channel capacity  $C$,  then an arbitrarily small block error probability is not achievable.


In the chapter  "AWGN model for discrete-time band-limited signals"  it is explained in connection with the continuous  $\text{AWGN channel model}$    what phenomenally great significance Shannon's theorem has for the entire field of information technology,  not only for those interested exclusively in theory,  but also for practitioners.


Channel capacity of a binary channel


General model of the binary channel

The mutual information of the general  (asymmetrical)  binary channel according to this sketch was calculated in  $\text{Example 2}$.  In this model, the input symbols  $0$  and  $1$  are distorted to different degrees:

$$P_{\hspace{0.05cm}Y\hspace{-0.01cm}|\hspace{-0.01cm}X}(Y\hspace{-0.01cm} |\hspace{-0.01cm} X) = \begin{pmatrix} 1 - \varepsilon_0 & \varepsilon_0\\ \varepsilon_1 & 1 - \varepsilon_1 \end{pmatrix} \hspace{0.05cm}.$$

The mutual information can be represented with the probability mass function  $P_X(X) = (p_0,\ p_1)$  as follows:

$$I(X ;Y) = \sum_{\mu = 1}^{2} \hspace{0.1cm}\sum_{\kappa = 1}^{2} \hspace{0.2cm} {\rm Pr} (\hspace{0.05cm}y_{\kappa}\hspace{0.03cm} |\hspace{0.05cm} x_{\mu}) \cdot {\rm Pr} (\hspace{0.05cm}x_{\mu}\hspace{0.05cm})\cdot {\rm log}_2 \hspace{0.1cm} \frac{{\rm Pr} (\hspace{0.05cm}y_{\kappa}\hspace{0.03cm} |\hspace{0.05cm} x_{\mu})}{{\rm Pr} (\hspace{0.05cm}y_{\kappa})} $$
$$\begin{align*}\Rightarrow \hspace{0.3cm} I(X ;Y) &= \hspace{-0.01cm} (1 \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_0) \cdot p_0 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1 \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_0}{(1 \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_0) \cdot p_0 + \varepsilon_1 \cdot p_1} + \varepsilon_0 \cdot p_0 \cdot {\rm log}_2 \hspace{0.1cm} \frac{\varepsilon_0}{(1 \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_0) \cdot p_0 + \varepsilon_1 \cdot p_1} \ + \\ & + \hspace{-0.01cm} \varepsilon_1 \cdot p_1 \cdot {\rm log}_2 \hspace{0.1cm} \frac{\varepsilon_1}{\varepsilon_0 \cdot p_0 + (1 \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_1) \cdot p_1} + (1 \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_1) \cdot p_1 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1 \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_1}{\varepsilon_0 \cdot p_0 + (1 \hspace{-0.08cm}- \hspace{-0.08cm}\varepsilon_1) \cdot p_1} \hspace{0.05cm}.\end{align*}$$
Mutual information for the
"asymmetrical binary channel"

$\text{Example 3}$:  In the following we set  $ε_0 = 0.01$  and  $ε_1 = 0.2$.

Column 4 of the adjacent table shows  (highlighted in green)  the mutual information  $I(X; Y)$  of this asymmetrical binary channel depending on the source symbol probability  $p_0 = {\rm Pr}(X = 0)$  .  One can see:

  • The mutual information depends on the symbol probabilities  $p_0$  and  $p_1 = 1 - p_0$.
  • Here the maximum value of  $I(X; Y)$  results in  $p_0 ≈ 0.55$    ⇒    $p_1 ≈ 0.45$.
  • The result  $p_0 > p_1$  follows from the relation  $ε_0 < ε_1$  (the  "zero"  is less distorted).
  • The capacity of this channel is  $C = 0.5779 \ \rm bit/use$.


In the above equation, the  $\text{Binary Symmetric Channel}$  $\rm (BSC)$  with parameters  $ε = ε_0 = ε_1$  is also included as a special case.  Hints:

  • In  "Exercise 3.10"  the mutual information of the BSC is calculated for the system parameters  $ε = 0.1$  and  $p_0 = 0.2$  .
  • In  "Exercise 3.10Z"  its channel capacity is given as follows:
$$C_{\rm BSC} = 1 - H_{\rm bin} (\varepsilon) \hspace{0.05cm}.$$

Properties of symmetrical channels


The capacity calculation of the (general)  $\text{discrete memoryless channel}$  $\rm (DMC)$  is often complex.  It simplifies decisively if symmetries of the channel are exploited.  

Examples of symmetrical channels


The diagram shows two examples:

  • In the case of the  uniformly dispersive channel  all source symbols  $x ∈ X$  result in exactly the same set of transition probabilities   ⇒   $\{P_{Y\hspace{0.03cm}|\hspace{0.01cm}X}(y_κ\hspace{0.05cm}|\hspace{0.05cm}x)\}$  with  $1 ≤ κ ≤ |Y|$.  Here  $q + r + s = 1$  must always apply here  (see left graph).
  • In the case of the  uniformly focusing channel , the same set of transition probabilities  ⇒   $\{P_{Y\hspace{0.03cm}|\hspace{0.01cm}X}(y\hspace{0.05cm}|\hspace{0.05cm}x_μ)\}$  with  $1 ≤ μ ≤ |X|$ results for all sink symbols  $y ∈ Y$ .   Here,  $t + u + v = 1$  need not necessarily hold  (see right graph).


$\text{Definition:}$  If a discrete memoryless channel is both uniformly dispersive and uniformly focusing,  it is a  »strictly symmetric channel«. 

  • With an equally distributed source alphabet, this channel has the capacity
$$C = {\rm log}_2 \hspace{0.1cm} \vert Y \vert + \sum_{y \hspace{0.05cm}\in\hspace{0.05cm} Y} \hspace{0.1cm} P_{\hspace{0.03cm}Y \vert \hspace{0.01cm} X}(y\hspace{0.05cm} \vert \hspace{0.05cm}x) \cdot {\rm log}_2 \hspace{0.1cm}P_{\hspace{0.01cm}Y \vert \hspace{0.01cm} X}(y\hspace{0.05cm}\vert\hspace{0.05cm} x) \hspace{0.05cm}.$$
  • Any  $x ∈ X$  can be used for this equation.


This definition will now be clarified by an example.

$\text{Example 4}$:  In the channel under consideration, there are connections between all  $ \vert X \vert = 3$  inputs and all  $ \vert Y \vert = 3$  outputs:

Strongly symmetrical channel  $\vert X \vert = \vert Y \vert= 3$
  • A red connection stands for  $P_{Y \hspace{0.03cm}\vert\hspace{0.01cm} X}(y_κ \hspace{0.05cm} \vert \hspace{0.05cm} x_μ) = 0.7$.
  • A blue connection stands for  $P_{Y \hspace{0.03cm}\vert\hspace{0.01cm} X}(y_κ \hspace{0.05cm}\vert \hspace{0.05cm} x_μ) = 0.2$.
  • A green connection stands for  $P_{Y \hspace{0.03cm}\vert\hspace{0.01cm} X}(y_κ \hspace{0.05cm}\vert\hspace{0.05cm} x_μ) = 0.1$.


According to the above equation, the following applies to the channel capacity:

$$C = {\rm log}_2 \hspace{0.1cm} (3) + 0.7 \cdot {\rm log}_2 \hspace{0.1cm} (0.7) + 0.2 \cdot {\rm log}_2 \hspace{0.1cm} (0.2) + 0.1 \cdot {\rm log}_2 \hspace{0.1cm} (0.1) = 0.4282 \,\,{\rm bit} \hspace{0.05cm}.$$

Notes:

  • The addition of  "the same set of transition probabilities”  in the above definitions does not mean that it must apply:
$$P_{Y \hspace{0.03cm}\vert\hspace{0.01cm} X}(y_κ \hspace{0.05cm}\vert\hspace{0.05cm} x_1) = P_{Y \hspace{0.03cm}\vert\hspace{0.01cm} X}(y_κ \hspace{0.05cm}\vert\hspace{0.05cm} x_2) = P_{Y \hspace{0.03cm}\vert\hspace{0.01cm} X}(y_κ \hspace{0.05cm}\vert\hspace{0.05cm} x_3).$$
  • Rather, here a red, a blue and a green arrow leaves from each input and a red, a blue and a green arrow arrives at each output.
  • The respective sequences permute:   R – G – B,     B – R – G,     G – B – R.


An example of a strictly symmetrical channel is the  $\text{Binary Symmetric Channel}$  $\rm (BSC)$.  In contrast, the  $\text{Binary Erasure Channel}$  $\rm (BEC)$  is not strictly symmetric,  because,

  • although it is uniformly dispersive,
  • but it is not uniformly focusing.


The following definition is less restrictive than the previous one of a strictly symmetric channel.

$\text{Definition:}$  A  »symmetric channel«  exists,

  • if it can be divided into several  $($generally $L)$  strongly symmetrical sub-channels,
  • by splitting the output alphabet  $Y$  into  $L$  subsets  $Y_1$, ... , $Y_L$ .


Such a  "symmetric channel"  has the following capacity:

$$C = \sum_{l \hspace{0.05cm}=\hspace{0.05cm} 1}^{L} \hspace{0.1cm} p_{\hspace{0.03cm}l} \cdot C_{\hspace{0.03cm}l} \hspace{0.05cm}.$$

The following designations are used here:

  • $p_{\hspace{0.03cm}l}$ indicates the probability that the  $l$–th sub-channel is selected.
  • $C_{\hspace{0.03cm}l}$  is the channel capacity of this  $l$–th sub-channel.


Symmetrical channel consisting of two strongly symmetrical
sub-channels  $\rm A$  and  $\rm B$

The diagram illustrates this definition for  $L = 2$  with the sub-channels  $\rm A$  and  $\rm B$.

  • The differently drawn transitions  (dashed or dotted)  show that the two sub-channels can be different,  so that  $C_{\rm A} ≠ C_{\rm B}$  will generally apply.
  • For the capacity of the total channel one thus obtains in general:
$$C = p_{\rm A} \cdot C_{\rm A} + p_{\rm B} \cdot C_{\rm B} \hspace{0.05cm}.$$
  • No statement is made here about the structure of the two sub-channels.


The following example will show that the  "Binary Erasure Channel"  $\rm (BEC)$  can also be described in principle by this diagram.   However, the two output symbols  $y_3$  and  $y_4$  must then be combined into a single symbol.

$\rm BEC$  in two different representations

$\text{Example 5}$:  The left figure shows the usual representation of the  $\text{Binary Erasure Channel}$  $\rm (BEC)$  with input  $X = \{0,\ 1\}$  and output  $Y = \{0,\ 1,\ \text{E} \}$.

If one divides this according to the right grafic into

  • an  "ideal channel"  $(y = x)$  for
$$y ∈ Y_{\rm A} = \{0, 1\} \ \ ⇒ \ \ C_{\rm A} = 1 \ \rm bit/use,$$
  • an  "erasure channel"  $(y = {\rm E})$  for
$$y ∈ Y_{\rm B} = \{\rm E \} \ \ ⇒ \ \ C_{\rm B} = 0,$$

then we get with the sub-channel weights  $p_{\rm A} = 1 – λ$  and  $p_{\rm B} = λ$:

$$C_{\rm BEC} = p_{\rm A} \cdot C_{\rm A} = 1 - \lambda \hspace{0.05cm}.$$

Both channels are strongly symmetrical.   The following applies equally for the (ideal) channel  $\rm A$ 

  • for  $X = 0$  and  $X = 1$:     $\text{Pr}(Y = 0 \hspace{0.05cm}\vert \hspace{0.05cm} X) = \text{Pr}(Y = 1 \hspace{0.05cm} \vert\hspace{0.05cm} X) = 1 - λ$   ⇒   uniformly dispersive,
  • for  $Y = 0$  and   $Y = 1$:     $\text{Pr}(Y \hspace{0.05cm} \vert \hspace{0.05cm} X= 0) = Pr(Y \hspace{0.05cm}\vert\hspace{0.05cm} X = 1) = 1 - λ$   ⇒   uniformly focusing.


The same applies to the erasure channel  $\rm B$.


In  "Exercise 3.12"  it will be shown that the capacity of the  $\text{Binary Symmetric Error & Erasure Channel}$  $\rm (BSEC)$  model can be calculated in the same way.  One obtains:

$$C_{\rm BSEC} = (1- \lambda) \cdot \left [ 1 - H_{\rm bin}(\frac{\varepsilon}{1- \lambda}) \right ]$$
  • with the crossover probability  $ε$
  • and the erasure probability  $λ$.


Some basics of channel coding


In order to interpret the channel coding theorem correctly, some basics of  »channel coding«.  This extremely important area of Communications Engineering is covered in our learning tutorial  $\rm LNTwww$  in a separate book called  "Channel Coding".

Model for binary–coded communication

The following description refers to the highly simplified model for  $\text{binary block codes}$:

  • The infinitely long source symbol sequence  $\underline{u}$  (not shown here)  is divided into blocks of  $k$  bits.  We denote the information block with the serial number  $j$  by  $\underline{u}_j^{(k)}$.
  • Each information block  $\underline{u}_j^{(k)}$  is converted into a code word  $\underline{x}_j^{(n)}$  by the channel encoder with a yellow background, where  $n > k$  is to apply.  The ratio  $R = k/n$  is called the  »code rate«.
  • The  "Discrete Memoryless Channel"  $\rm (DMC)$  is taken into account by transition probabilities  $P_{Y\hspace{0.03cm}|\hspace{0.03cm}X}(⋅)$ .  This block with a green background causes errors at the bit level.  The following can therefore apply:   $y_{j, \hspace{0.03cm}i} ≠ x_{j,\hspace{0.03cm} i}$.
  • Thus the received blocks  $\underline{y}_j^{(n)}$  consisting of   $n$  bits can also differ from the code words  $\underline{x}_j^{(n)}$ .  Likewise, the following generally applies to the blocks after the decoder: 
$$\underline{v}_j^{(k)} ≠ \underline{u}_j^{(k)}.$$


$\text{Example 6}$:  The diagram is intended to illustrate the nomenclature used here using the example of  $k = 3$  and  $n = 4$.  The first eight blocks of the information sequence  $\underline{u}$  and the  encoded sequence $\underline{x}$ are shown.

Bit designation of information block and code word

One can see the following assignment between the blocked and the unblocked description:

  • Bit 3 of the 1st information block   ⇒   $u_{1,\hspace{0.08cm} 3}$  corresponds to the symbol  $u_3$  in unblocked representation.
  • Bit 1 of the 2nd information block   ⇒   $u_{2, \hspace{0.08cm}1}$  corresponds to the symbol  $u_4$  in unblocked representation.
  • Bit 2 of the 6th information block   ⇒   $u_{6, \hspace{0.08cm}2}$  corresponds to the symbol  $u_{17}$  in unblocked representation.
  • Bit 4 of the 1st code word   ⇒   $x_{1, \hspace{0.08cm}4}$  corresponds to the symbol  $x_4$  in unblocked representation.
  • Bit 1 of the 2nd code word   ⇒   $x_{2, \hspace{0.08cm}1}$  corresponds to the symbol  $x_5$  in unblocked representation.
  • Bit 2 of the 6th code word   ⇒   $x_{6, \hspace{0.08cm}2}$  corresponds to the symbol  $x_{22}$  in unblocked representation.

Relationship between block errors and bit errors


To interpret the channel coding theorem, we still need various definitions for error probabilities.  Various descriptive variables can be derived from the above system model:

$\text{Definitions:}$

  • In the present channel model, the  $\text{channel error probability}$  is given by
$$\text{Pr(channel error)} = {\rm Pr} \left ({y}_{j,\hspace{0.05cm} i} \ne {x}_{j,\hspace{0.05cm} i} \right )\hspace{0.05cm}.$$
For example, in the BSC model  $\text{Pr(channel error)} = ε$  für alle  $j = 1, 2$, ...  and  $1 ≤ i ≤ n$.
  • The  $\text{block error probability}$  refers to the allocated information blocks at the encoder input   ⇒   $\underline{u}_j^{(k)}$  and the decoder output   ⇒   $\underline{v}_j^{(k)}$,  each in blocks of  $k$  bits:
$$\text{Pr(block error)} = {\rm Pr} \left (\underline{\upsilon}_j^{(k)} \ne \underline{u}_j^{(k)} \right )\hspace{0.05cm}.$$
  • The  $\text{bit error probability}$  also refers to the input and the output of the entire coding system under consideration, but at the bit level:
$$\text{Pr(bit error)} = {\rm Pr} \left ({\upsilon}_{j,\hspace{0.05cm} i} \ne {u}_{j,\hspace{0.05cm} i} \right )\hspace{0.05cm}.$$
For simplicity, it is assumed here that all  $k$  bits  $u_{j,\hspace{0.08cm}i}$  $(1 ≤ i ≤ k)$  of the information block  $j$  are falsified with equal probability.
Otherwise, the  $k$  bits would have to be averaged.


There is a general relationship between the block error probability and the bit error probability:

$${1}/{k} \cdot \text{Pr(block error)} \le \text{Pr(bit error)} \le \text{Pr(block error)} \hspace{0.05cm}.$$
  • The lower bound results when all bits are wrong in all faulty blocks.
  • If there is exactly only one bit error in each faulty block, then the bit error probability is identical to the block error probability:
$$ \text{Pr(bit error)} \equiv \text{Pr(block error)} \hspace{0.05cm}.$$
Definition of different error probabilities

$\text{Example 7:}$  The upper graph shows the first eight received blocks  $\underline{y}_j^{(n)}$  with  $n = 4$  bits each.  Here channel errors are shaded green.

Below, the initial sequence  $\underline{v}$  after decoding is sketched, divided into blocks  $\underline{v}_j^{(k)}$  with  $k = 3$  bits each. Note:

  • Bit errors are shaded red in the lower diagram.
  • Block errors can be recognized by the blue frame.


For this, some  (due to the short sequence)  only very vague information about the error probabilities:

  • Half of the received bits are shaded green.  From this follows:  
$$\text{Pr(channel error)} = 16/32 = 1/2.$$
  • The bit error probability with the exemplary encoding and decoding is:  
$$\text{Pr(bit error)} = 8/24 = 1/3.$$
  • In contrast, with uncoded transmission would be:  
$$\text{Pr(bit error)} = \text{Pr(channel error)} = 1/2.$$
  • Half of the decoded blocks are outlined in blue. From this follows:  
$$\text{Pr(block error)} = 4/8 = 1/2.$$
  • With  $\text{Pr(block error)}= 1/2$  and  $k = 3$  the bit error probability is in the following range:  
$$1/6 \le \text{Pr(bit error)} \le 1/2 \hspace{0.05cm}.$$
  1. The upper bound with respect to bit errors is obtained when all bits in each of the four falsified blocks are wrong:   $\text{Pr(bit error)} = 12/24 = 1/2.$
  2. The lower bound indicates that only one bit is wrong in each of the four falsified blocks:   $\text{Pr(bit error)} = 4/24 = 1/6$.


Rate, channel capacity and bit error probability


$\text{Basically:}$ 

  • Channel coding increases the reliability of data transmission from the source to the sink.
  • If the code rate  $R = k/n$  is reduced and the added redundancy  $(1 - R)$  is increased, the data reliability is generally improved and the bit error probability is reduced, which we will refer to as  $p_{\rm B}$  in the following:
$$p_{\rm B} = \text{Pr(bit error)} = {\rm Pr} \left ({\upsilon}_{j,\hspace{0.05cm} i} \ne {u}_{j,\hspace{0.05cm} i} \right )\hspace{0.05cm}.$$


The following theorem is based on the  "Data Processing Theorem"  and  "Fano's Lemma".  The derivation can be found in standard works on information theory, for example in  [CT06][2]:

$\text{Inversion of Shannon's Channel Coding Theorem:}$ 

If one uses a channel with too small a capacity  $R$  for data transmission at rate  $C < R$, the bit error probability  $p_{\rm B}$  cannot fall below a lower bound even with the best possible channel coding:

$$p_{\rm B} \ge H_{\rm bin}^{-1} \cdot \left ( 1 - {C}/{R}\right ) > 0\hspace{0.05cm}.$$

Here  $H_{\rm bin}(⋅)$  denotes the  $\text{binary entropy function}$.


Since the probability of the block errors can never be smaller than that of the bit errors,  the block error probability "zero" is also not possible for  $R > C$ . 
From the given bounds for the bit errors,

$$ {1}/{k} \cdot {\rm Pr}({\rm block\ error}) \le {\rm Pr}({\rm bit\ error}) \le {\rm Pr}({\rm block\ error})\hspace{0.05cm},$$

a range for the block error probability can also be given:

$$ {\rm Pr}({\rm bit\ error}) \le {\rm Pr}({\rm block\ error}) \le k \cdot {\rm Pr}({\rm bit\ error})\hspace{0.05cm}.$$

$\text{Example 8:}$  For a channel with capacity  $C = 1/3$  (bit), error-free data transmission   $(p_{\rm B} = 0)$  with code rate  $R < 1/3$  is possible in principle.

  • However, from the channel coding theorem the special  $(k$,  $n)$–block code is not known which makes this desired result possible.   Shannon also makes no statements on this.
  • All that is known is that such a best possible code works with blocks of infinite length.  For a given code rate  $R = k/n$  both  $k → ∞$  and  $n → ∞$ thus apply.
  • The statement  "The bit error probability is zero"  is not the same as  "No bit errors occur":   Even with a finite number of bit errors and  $k → ∞$  ⇒   $p_{\rm B} = 0$.


With the code rate  $R = 1 > C$  (uncoded transmission) one obtains:

$$p_{\rm B} \ge H_{\rm bin}^{-1} \cdot \left ( 1 - \frac{1/3}{1}\right ) = H_{\rm bin}^{-1}(2/3) \approx 0.174 > 0\hspace{0.05cm}.$$

With the code rate  $R = 1/2 > C$ , the bit error probability is smaller  but also different from zero:

$$p_{\rm B} \ge H_{\rm bin}^{-1} \cdot \left ( 1 - \frac{1/3}{1/2}\right ) = H_{\rm bin}^{-1}(1/3) \approx 0.062 > 0\hspace{0.05cm}.$$


Exercises for the chapter


Exercise 3.10: Mutual Information at the BSC

Exercise 3.10Z: BSC Channel Capacity

Exercise 3.11: Erasure Channel

Exercise 3.11Z: Extremely Asymmetrical Channel

Exercise 3.12: Strictly Symmetrical Channels

Exercise 3.13: Code Rate and Reliability

Exercise 3.14: Channel Coding Theorem

Exercise 3.15: Data Processing Theorem


References

  1. Shannon, C.E.:  A Mathematical Theory of Communication. In:  Bell Syst. Techn. J. 27 (1948), S. 379-423 und S. 623-656.
  2. 2.0 2.1 Cover, T.M.; Thomas, J.A.:  Elements of Information Theory.  West Sussex: John Wiley & Sons, 2nd Edition, 2006.
  3. Kramer, G.:  Information Theory.  Lecture manuscript, Chair of Communications Engineering, Technische Universität München, 2013.
  4. Mecking, M.:  Information Theory.  Lecture manuscript, Chair of Communications Engineering, Technische Universität München, 2009.