Difference between revisions of "Information Theory/AWGN Channel Capacity for Continuous-Valued Input"

From LNTwww
 
(39 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
   
 
   
 
{{Header
 
{{Header
|Untermenü=Wertkontinuierliche Informationstheorie
+
|Untermenü=Information Theory for Continuous Random Variables
 
|Vorherige Seite=Differentielle Entropie
 
|Vorherige Seite=Differentielle Entropie
|Nächste Seite=AWGN–Kanalkapazität bei wertdiskretem Eingang
+
|Nächste Seite=AWGN Channel Capacity for Discrete Input
 
}}
 
}}
  
  
==Mutual information between continuous-value random variables ==
+
==Mutual information between continuous random variables ==
 
<br>
 
<br>
In the chapter &nbsp;[[Information_Theory/Anwendung_auf_die_Digitalsignalübertragung#Informationstheoretisches_Modell_der_Digitalsignal.C3.BCbertragung|Information-theoretical model of digital signal transmission]]&nbsp; the&nbsp; ''mutual information'' between the two discrete-value random variables&nbsp; $X$&nbsp; and&nbsp; $Y$&nbsp; was given, among other things, in the following form:
+
In the chapter &nbsp;[[Information_Theory/Application_to_Digital_Signal_Transmission#Information-theoretical_model_of_digital_signal_transmission|"Information-theoretical model of digital signal transmission"]]&nbsp; the&nbsp; "mutual information"&nbsp; between the two discrete random variables&nbsp; $X$&nbsp; and&nbsp; $Y$&nbsp; was given, among other things,&nbsp; in the following form:
 
   
 
   
:$$I(X;Y) = \hspace{-0.4cm} \sum_{(x,\hspace{0.05cm} y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{XY}\hspace{-0.08cm})}  
+
:$$I(X;Y) = \hspace{0.5cm} \sum_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\sum_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})}
  \hspace{-0.8cm} P_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{ P_{XY}(x, y)}{P_{X}(x) \cdot P_{Y}(y)} \hspace{0.05cm}.$$
+
  \hspace{-0.9cm} P_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{ P_{XY}(x, y)}{P_{X}(x) \cdot P_{Y}(y)} \hspace{0.05cm}.$$
  
Diese Gleichung entspricht gleichzeitig der &nbsp;[[Information_Theory/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgrößen#Relative_Entropie_.E2.80.93_Kullback.E2.80.93Leibler.E2.80.93Distanz|Kullback&ndash;Leibler&ndash;Distanz]]&nbsp; (kurz KLD) zwischen der Verbundwahrscheinlichkeitsfunktion&nbsp; $P_{XY}$&nbsp; und dem Produkt der beiden Einzelwahrscheinlichkeitsfunktionen&nbsp; $P_X$&nbsp; und&nbsp; $P_Y$ :
+
This equation simultaneously corresponds to the &nbsp;[[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables#Informational_divergence_-_Kullback-Leibler_distance|"Kullback&ndash;Leibler distance"]]&nbsp; between the joint probability function&nbsp; $P_{XY}$&nbsp; and the product of the two individual probability functions&nbsp; $P_X$&nbsp; and&nbsp; $P_Y$:
 
   
 
   
 
:$$I(X;Y) = D(P_{XY} \hspace{0.05cm} ||  \hspace{0.05cm}P_{X} \cdot P_{Y}) \hspace{0.05cm}.$$
 
:$$I(X;Y) = D(P_{XY} \hspace{0.05cm} ||  \hspace{0.05cm}P_{X} \cdot P_{Y}) \hspace{0.05cm}.$$
  
Um daraus die Transinformation&nbsp; $I(X; Y)$&nbsp; zwischen zwei wertkontinuierlichen Zufallsgrößen&nbsp; $X$&nbsp; und&nbsp; $Y$&nbsp; abzuleiten, geht man wie folgt vor, wobei Hochkommata  auf eine quantisierte Größe hinweisen:
+
In order to derive the mutual information&nbsp; $I(X; Y)$&nbsp; between two continuous random variables&nbsp; $X$&nbsp; and&nbsp; $Y$,&nbsp; one proceeds as follows,&nbsp; whereby inverted commas indicate a quantized variable:
*Man quantisiert die Zufallsgrößen&nbsp; $X$&nbsp; und&nbsp; $Y$&nbsp; $($mit den Quantisierungsintervallen&nbsp; ${\it Δ}x$&nbsp; und&nbsp; ${\it Δ}y)$&nbsp; und erhält so die Wahrscheinlichkeitsfunktionen&nbsp; $P_{X\hspace{0.01cm}′}$&nbsp; und&nbsp; $P_{Y\hspace{0.01cm}′}$.
+
*One quantizes the random variables&nbsp; $X$&nbsp; and&nbsp; $Y$&nbsp; $($with the quantization intervals&nbsp; ${\it Δ}x$&nbsp; and&nbsp; ${\it Δ}y)$&nbsp; and thus obtains the probability functions&nbsp; $P_{X\hspace{0.01cm}′}$&nbsp; and&nbsp; $P_{Y\hspace{0.01cm}′}$.
*Die „Vektoren”&nbsp; $P_{X\hspace{0.01cm}′}$&nbsp; und&nbsp; $P_{Y\hspace{0.01cm}′}$&nbsp; werden nach den Grenzübergängen&nbsp; ${\it Δ}x → 0,&nbsp; {\it Δ}y → 0$&nbsp; unendlich lang, und auch die Verbund–PMF&nbsp; $P_{X\hspace{0.01cm}′\hspace{0.08cm}Y\hspace{0.01cm}′}$&nbsp; ist dann in der Fläche unendlich weit ausgedehnt.
+
 
*Durch diese Grenzübergänge ergeben sich die Wahrscheinlichkeitsdichtefunktionen der kontinuierlichen Zufallsgrößen entsprechend den folgenden Gleichungen:
+
*The "vectors"&nbsp; $P_{X\hspace{0.01cm}′}$&nbsp; and&nbsp; $P_{Y\hspace{0.01cm}′}$&nbsp; become infinitely long after the boundary transitions&nbsp; ${\it Δ}x → 0,\hspace{0.1cm} {\it Δ}y → 0$,&nbsp; and the joint PMF&nbsp; $P_{X\hspace{0.01cm}′\hspace{0.08cm}Y\hspace{0.01cm}′}$&nbsp; is also infinitely extended in area.
 +
 
 +
*These boundary transitions give rise to the probability density functions of the continuous random variables according to the following equations:
 
   
 
   
 
:$$f_X(x_{\mu}) = \frac{P_{X\hspace{0.01cm}'}(x_{\mu})}{\it \Delta_x} \hspace{0.05cm},  
 
:$$f_X(x_{\mu}) = \frac{P_{X\hspace{0.01cm}'}(x_{\mu})}{\it \Delta_x} \hspace{0.05cm},  
Line 27: Line 29:
 
\hspace{0.3cm}f_{XY}(x_{\mu}\hspace{0.05cm}, y_{\mu}) = \frac{P_{X\hspace{0.01cm}'\hspace{0.03cm}Y\hspace{0.01cm}'}(x_{\mu}\hspace{0.05cm}, y_{\mu})} {{\it \Delta_x} \cdot {\it \Delta_y}} \hspace{0.05cm}.$$
 
\hspace{0.3cm}f_{XY}(x_{\mu}\hspace{0.05cm}, y_{\mu}) = \frac{P_{X\hspace{0.01cm}'\hspace{0.03cm}Y\hspace{0.01cm}'}(x_{\mu}\hspace{0.05cm}, y_{\mu})} {{\it \Delta_x} \cdot {\it \Delta_y}} \hspace{0.05cm}.$$
  
*Aus der Doppelsumme in der obigen Gleichung wird nach der Umbenennung&nbsp; $Δx → {\rm d}x$&nbsp; bzw.&nbsp; $Δy → {\rm d}y$&nbsp; die für wertkontinuierliche Zufallsgrößen gültige Gleichung:
+
*The double sum in the above equation, after renaming&nbsp; $Δx → {\rm d}x$&nbsp; and&nbsp; $Δy → {\rm d}y$,&nbsp; becomes the equation valid for continuous value random variables:
 
   
 
   
:$$I(X;Y) = \hspace{0.2cm} \int \hspace{-0.9cm} \int\limits_{\hspace{-0.4cm}(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm} (\hspace{-0.03cm}f_{XY}\hspace{-0.08cm})}  
+
:$$I(X;Y) = \hspace{0.5cm} \int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})}
  \hspace{-0.6cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{ f_{XY}(x, y) }  
+
  \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{ f_{XY}(x, y) }  
 
{f_{X}(x) \cdot f_{Y}(y)}
 
{f_{X}(x) \cdot f_{Y}(y)}
 
  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y \hspace{0.05cm}.$$
 
  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y \hspace{0.05cm}.$$
  
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
$\text{Fazit:}$&nbsp; Durch Aufspaltung dieses Doppelintegrals lässt sich für die Transinformation auch schreiben:
+
$\text{Conclusion:}$&nbsp; By splitting this double integral,&nbsp; it is also possible to write for the&nbsp; &raquo;'''mutual information'''&laquo;:
 
   
 
   
 
:$$I(X;Y) = h(X) + h(Y) - h(XY)\hspace{0.05cm}.$$
 
:$$I(X;Y) = h(X) + h(Y) - h(XY)\hspace{0.05cm}.$$
  
Verwendet ist hierbei die ''differentielle Verbund–Entropie''
+
The&nbsp; &raquo;'''joint differential entropy'''&laquo;
 
   
 
   
:$$h(XY) = -\hspace{0.2cm} \int \hspace{-0.9cm} \int\limits_{\hspace{-0.4cm}(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm} (\hspace{-0.03cm}f_{XY}\hspace{-0.08cm})}  
+
:$$h(XY)   = - \hspace{-0.3cm}\int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})}
  \hspace{-0.6cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \big[f_{XY}(x, y) \big]
+
  \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \hspace{0.1cm} \big[f_{XY}(x, y) \big]
 
  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y$$
 
  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y$$
  
sowie die beiden ''differentiellen Einzel–Entropien''
+
and the two&nbsp; &raquo;'''differential single entropies'''&laquo;
 
  
 
  
 
:$$h(X) = -\hspace{-0.7cm}  \int\limits_{x \hspace{0.05cm}\in \hspace{0.05cm}{\rm supp}\hspace{0.03cm} (\hspace{-0.03cm}f_X)} \hspace{-0.35cm}  f_X(x) \cdot {\rm log} \hspace{0.1cm} \big[f_X(x)\big] \hspace{0.1cm}{\rm d}x
 
:$$h(X) = -\hspace{-0.7cm}  \int\limits_{x \hspace{0.05cm}\in \hspace{0.05cm}{\rm supp}\hspace{0.03cm} (\hspace{-0.03cm}f_X)} \hspace{-0.35cm}  f_X(x) \cdot {\rm log} \hspace{0.1cm} \big[f_X(x)\big] \hspace{0.1cm}{\rm d}x
Line 52: Line 54:
 
\hspace{0.05cm}.$$}}
 
\hspace{0.05cm}.$$}}
  
==Zur Äquivokation und Irrelevanz==
+
==On equivocation and irrelevance==
 
<br>  
 
<br>  
Wir gehen weiter von der wertkontinuierlichen Transinformationsgleichung &nbsp;$I(X;Y) = h(X) + h(Y) - h(XY)$&nbsp; aus.&nbsp; Diese Darstellung findet sich auch im folgenden Schaubild (linke Grafik).
+
We further assume the continuous  mutual information&nbsp; $I(X;Y) = h(X) + h(Y) - h(XY)$. &nbsp; This representation is also found in the following diagram&nbsp; $($left graph$)$.
  
[[File:P_ID2882__Inf_T_4_2_S2neu.png|right|frame|Darstellung der Transinformation für wertkontinuierliche Zufallsgrößen]]
+
[[File:EN_Inf_T_4_2_S2.png|right|frame|Representation of the mutual information for continuous-valued random variables]]
  
Daraus erkennt man, dass die Transinformation auch noch wie folgt dargestellt werden kann:
+
From this you can see that the mutual information can also be represented as follows:
 
   
 
   
 
:$$I(X;Y) = h(Y) - h(Y \hspace{-0.1cm}\mid \hspace{-0.1cm} X) =h(X) - h(X \hspace{-0.1cm}\mid \hspace{-0.1cm} Y)\hspace{0.05cm}.$$
 
:$$I(X;Y) = h(Y) - h(Y \hspace{-0.1cm}\mid \hspace{-0.1cm} X) =h(X) - h(X \hspace{-0.1cm}\mid \hspace{-0.1cm} Y)\hspace{0.05cm}.$$
  
Diese fundamentalen informationstheoretischen Zusammenhänge kann man auch aus der rechten Grafik ablesen.&nbsp;  
+
These fundamental information-theoretical relationships can also be read from the graph on the right.&nbsp;  
  
Diese gerichtete Darstellung ist für Nachrichtenübertragungssysteme besonders geeignet.  
+
&rArr; &nbsp; This directional representation is particularly suitable for communication systems.&nbsp; The outflowing or inflowing differential entropy characterises
 +
*the&nbsp; &raquo;'''equivocation'''&laquo;:
 +
 +
:$$h(X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y)  = - \hspace{-0.3cm}\int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})}
 +
\hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \hspace{0.1cm} \big [{f_{\hspace{0.03cm}X \mid \hspace{0.03cm} Y} (x \hspace{-0.05cm}\mid \hspace{-0.05cm} y)} \big]
 +
\hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y,$$
  
Die abfließende bzw. zufließende differentielle Entropie kennzeichnen
+
*the&nbsp; &raquo;'''irrelevance'''&laquo;:
*die&nbsp; '''Äquivokation'''&nbsp; (englisch:&nbsp; ''Equivocation''):
 
 
   
 
   
:$$h(X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y) =\hspace{0.2cm} -\int \hspace{-0.9cm} \int\limits_{\hspace{-0.4cm}(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.03cm} (\hspace{-0.03cm}f_{XY}\hspace{-0.08cm})}  
+
:$$h(Y \hspace{-0.05cm}\mid \hspace{-0.05cm} X)   = - \hspace{-0.3cm}\int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})}
  \hspace{-0.6cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \big [{f_{\hspace{0.03cm}X \mid \hspace{0.03cm} Y} (x \hspace{-0.05cm}\mid \hspace{-0.05cm} y)} \big]
+
  \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \hspace{0.1cm} \big [{f_{\hspace{0.03cm}Y \mid \hspace{0.03cm} X} (y \hspace{-0.05cm}\mid \hspace{-0.05cm} x)} \big]
  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y\hspace{0.05cm},$$
+
  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y.$$
 +
 
 +
The significance of these two information-theoretic quantities will be discussed in more detail in&nbsp; [[Aufgaben:Exercise_4.5Z:_Again_Mutual_Information|$\text{Exercise 4.5Z}$]]&nbsp;.
 +
 
 +
If one compares the graphical representations of the mutual information for
 +
*discrete  random variables in the section &nbsp;[[Information_Theory/Application_to_Digital_Signal_Transmission#Information-theoretical_model_of_digital_signal_transmission|"Information-theoretical model of digital signal transmission"]],&nbsp; and
 +
 
 +
*continuous  random variables according to the above diagram,
 +
 
  
*die&nbsp; '''Irrelevanz'''&nbsp; (englisch:&nbsp; ''Irrelevance''):
+
the only distinguishing feature is that each&nbsp; $($capital$)$&nbsp; $H$&nbsp; $($entropy;&nbsp; $\ge 0)$&nbsp; has been replaced by a&nbsp; $($non-capital$)$ $h$&nbsp; $($differential entropy;&nbsp; can be positive,&nbsp; negative or zero$)$.
 
   
 
   
:$$h(Y \hspace{-0.05cm}\mid \hspace{-0.05cm} X) =\hspace{0.2cm}- \int \hspace{-0.9cm} \int\limits_{\hspace{-0.4cm}(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.03cm} (\hspace{-0.03cm}f_{XY}\hspace{-0.08cm})}
+
*Otherwise,&nbsp; the mutual information is the same in both representations and&nbsp; $I(X; Y) 0$&nbsp; always applies.
\hspace{-0.6cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \big [{f_{\hspace{0.03cm}Y \mid \hspace{0.03cm} X} (y \hspace{-0.05cm}\mid \hspace{-0.05cm} x)} \big]
 
\hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y\hspace{0.05cm}.$$
 
  
Auf die Bedeutung dieser beiden informationstheoretischen Größen wird in der&nbsp; [[Aufgaben:4.5Z_Nochmals_Transinformation|Aufgabe 4.5Z]]&nbsp; noch genauer eingegangen.  
+
*In the following,&nbsp; we mostly use the&nbsp; "binary logarithm"  &nbsp; ⇒  &nbsp;  $\log_2$&nbsp; and thus obtain the mutual information with the pseudo-unit&nbsp; "bit".
  
Vergleicht man die grafischen Darstellungen der Transinformation bei
 
*wertdiskreten Zufallsgrößen im Abschnitt &nbsp;[[Information_Theory/Anwendung_auf_die_Digitalsignalübertragung#Informationstheoretisches_Modell_der_Digitalsignal.C3.BCbertragung|Informationstheoretisches Modell der Digitalsignalübertragung]]&nbsp; und
 
*wertkontinuierlichen Zufallsgrößen entsprechend obigem Schaubild,
 
  
 +
==Calculation of mutual information with additive noise ==
 +
<br>
 +
We now consider a very simple model of message transmission:
 +
*The random variable&nbsp; $X$&nbsp; stands for the&nbsp; $($zero mean$)$&nbsp; transmitted signal and is characterized by PDF&nbsp; $f_X(x)$&nbsp; and variance&nbsp; $σ_X^2$.&nbsp;  Transmission power:&nbsp; $P_X = σ_X^2$.
  
so erkennt man als einziges Unterscheidungsmerkmal, dass jedes „groß $H$&nbsp; (Entropie; größer/gleich Null)&nbsp; durch ein „klein $h$”&nbsp; (differentielle Entropie, kann positiv, negativ oder Null sein)&nbsp; ersetzt wurde.
+
*The additive noise&nbsp; $N$&nbsp; is given by the&nbsp; $($mean-free$)$&nbsp; PDF&nbsp; $f_N(n)$&nbsp; and the noise power&nbsp; $P_N = σ_N^2$.
*Ansonsten ist die Transinformation in beiden Darstellungen gleich und es gilt stets &nbsp;$I(X; Y) ≥ 0$.
 
*Im Folgenden verwenden wir meist den&nbsp; ''Logarithmus dualis''  &nbsp; ⇒  &nbsp;  $\log_2$&nbsp; und erhalten somit die Transinformation in „bit”.
 
  
 +
*If&nbsp; $X$&nbsp; and&nbsp; $N$&nbsp; are assumed to be statistically independent &nbsp; &rArr; &nbsp;  signal-independent noise, then&nbsp; $\text{E}\big[X · N \big] = \text{E}\big[X \big] · \text{E}\big[N\big] = 0$ .
 +
[[File:Inf_T_4_2_S3neu.png|right|frame|Transmission system with additive noise]]
  
==Transinformationsberechnung bei additiver Störung ==
+
*The received signal is &nbsp;$Y = X + N$.&nbsp; The output PDF&nbsp; $f_Y(y)$&nbsp; can be calculated with the [[Signal_Representation/The_Convolution_Theorem_and_Operation#Convolution_in_the_time_domain|"convolution operation"]]&nbsp; &nbsp; ⇒ &nbsp;  $f_Y(y) = f_X(x) ∗ f_N(n)$.
<br>
 
Wir betrachten nun ein sehr einfaches Modell der Nachrichtenübertragung:
 
*Die Zufallsgröße&nbsp; $X$&nbsp; steht für das (mittelwertfreie) Sendesignal und ist durch die WDF&nbsp; $f_X(x)$&nbsp; und die Varianz&nbsp; $σ_X^2$&nbsp; gekennzeichnet.&nbsp; Die Sendeleistung ist $P_X = σ_X^2$.
 
*Die additive Störung&nbsp; $N$&nbsp; ist durch die WDF&nbsp; $f_N(n)$&nbsp; und die Störleistung&nbsp; $P_N = σ_N^2$&nbsp; gegeben.
 
*Wenn&nbsp; $X$&nbsp; und&nbsp; $N$&nbsp; als statistisch unabhängig angenommen werden &nbsp; &rArr; &nbsp; signalunabhängiges Rauschen, dann gilt&nbsp; $\text{E}\big[X · N \big] = \text{E}\big[X \big] · \text{E}\big[N\big] = 0$ .
 
*Das Empfangssignal ist &nbsp;$Y = X + N$.&nbsp; Die Ausgangs–WDF&nbsp; $f_Y(y)$&nbsp; ist mit der [[Signal_Representation/The_Convolution_Theorem_and_Operation#Faltung_im_Zeitbereich|Faltungsoperation]]&nbsp; berechenbar  &nbsp; ⇒ &nbsp;  $f_Y(y) = f_X(x) ∗ f_N(n)$.
 
  
[[File:Inf_T_4_2_S3neu.png|right|frame|Nachrichtenübertragungssystem mit additiver Störung]]
+
* For the received power holds:
* Für die Empfangsleistung (Varianz) gilt:
 
 
   
 
   
 
:$$P_Y = \sigma_Y^2 = {\rm E}\big[Y^2\big] = {\rm E}\big[(X+N)^2\big] =  {\rm E}\big[X^2\big] +  {\rm E}\big[N^2\big] = \sigma_X^2 + \sigma_N^2 $$
 
:$$P_Y = \sigma_Y^2 = {\rm E}\big[Y^2\big] = {\rm E}\big[(X+N)^2\big] =  {\rm E}\big[X^2\big] +  {\rm E}\big[N^2\big] = \sigma_X^2 + \sigma_N^2 $$
Line 106: Line 112:
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
Die nebenstehend skizzierten Dichtefunktionen (rechteck– bzw. trapezförmig) sollen nur den Rechengang verdeutlichen und haben keine praktische Relevanz.
+
The sketched probability density functions&nbsp; $($rectangular or trapezoidal$)$&nbsp; are only intended to clarify the calculation process and have no practical relevance.
<br clear=all>
+
 
Zur Berechnung der Transinformation zwischen dem Eingang&nbsp; $X$&nbsp; und dem Ausgang&nbsp; $Y$&nbsp; gibt es entsprechend dem&nbsp; [[Information_Theory/AWGN–Kanalkapazität_bei_wertkontinuierlichem_Eingang#Zur_.C3.84quivokation_und_Irrelevanz|Schaubild auf der vorherigen Seite]]&nbsp; drei Möglichkeiten:
+
To calculate the mutual information between input&nbsp; $X$&nbsp; and output&nbsp; $Y$&nbsp; there are three possibilities according to the&nbsp; [[Information_Theory/AWGN–Kanalkapazität_bei_wertkontinuierlichem_Eingang#On_equivocation_and_irrelevance|"graphic in the previous subchapter"]]:
* Berechnung entsprechend &nbsp;$I(X, Y) = h(X) + h(Y) - h(XY)$:
+
* Calculation according to &nbsp;$I(X, Y) = h(X) + h(Y) - h(XY)$:
:Die beiden ersten Terme sind aus &nbsp;$f_X(x)$&nbsp; bzw. &nbsp;$f_Y(y)$&nbsp; in einfacher Weise berechenbar.&nbsp; Problematisch ist die&nbsp; ''differentielle Verbundentropie'' &nbsp;$h(XY)$.&nbsp; Hierzu benötigt man die 2D–Verbund–WDF &nbsp;$f_{XY}(x, y)$, die meist nicht direkt gegeben ist.
+
::The first two terms can be calculated in a simple way from &nbsp;$f_X(x)$&nbsp; and &nbsp;$f_Y(y)$&nbsp; respectively.&nbsp; The&nbsp; "joint differential entropy" &nbsp;$h(XY)$ is problematic.&nbsp; For this,&nbsp; one needs the two-dimensional joint PDF &nbsp;$f_{XY}(x, y)$,&nbsp; which is usually not given directly.
  
* Berechnung entsprechend &nbsp;$I(X, Y) = h(Y) - h(Y|X)$:
+
* Calculation according to &nbsp;$I(X, Y) = h(Y) - h(Y|X)$:
:Hierbei bezeichnet &nbsp;$h(Y|X)$&nbsp; die&nbsp; ''differentielle Streuentropie''.&nbsp; Es gilt &nbsp;$h(Y|X) = h(X + N|X) = h(N)$, so dass &nbsp;$I(X; Y)$&nbsp; bei Kenntnis von $f_X(x)$&nbsp; und $f_N(n)$&nbsp; über die Gleichung &nbsp;$f_Y(y) = f_X(x) f_N(n)$&nbsp; sehr einfach zu berechnen ist.
+
::Here &nbsp;$h(Y|X)$&nbsp; denotes the&nbsp; "differential  irrelevance".&nbsp; It holds &nbsp;$h(Y|X) = h(X + N|X) = h(N)$,&nbsp; so that &nbsp;$I(X; Y)$&nbsp; is very easy to calculate via the equation &nbsp;$f_Y(y) = f_X(x) f_N(n)$&nbsp; if $f_X(x)$&nbsp; and $f_N(n)$&nbsp; are known.
  
* Berechnung entsprechend &nbsp;$I(X, Y) = h(X) - h(X|Y)$:
+
* Calculation according to &nbsp;$I(X, Y) = h(X) - h(X|Y)$:
:Nach dieser Gleichung benötigt man allerdings die differentielle Rückschlussentropie &nbsp;$h(X|Y)$, die schwieriger angebbar ist als &nbsp;$h(Y|X)$.
+
::According to this equation,&nbsp; however,&nbsp; one needs the &nbsp; "differential equivocation" &nbsp; $h(X|Y)$,&nbsp; which is more difficult to state than&nbsp; $h(Y|X)$.
  
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
$\text{Fazit:}$&nbsp; Im Folgenden verwenden wir die mittlere Gleichung und schreiben für die Transinformation zwischen dem Eingang&nbsp; $X$&nbsp; und dem Ausgang&nbsp; $Y$&nbsp; eines&nbsp; ''Nachrichtenübertragungssystems bei additiver und unkorrelierter Störung''&nbsp; $N$:
+
$\text{Conclusion:}$&nbsp; In the following we use the middle equation and write for the&nbsp; &raquo;'''mutual information'''&laquo;&nbsp; between the input&nbsp; $X$&nbsp; and the output&nbsp; $Y$&nbsp; of a&nbsp; transmission system in the presence of additive and uncorrelated noise&nbsp; $N$:
 
 
 
 
 
:$$I(X;Y) \hspace{-0.05cm} = \hspace{-0.01cm} h(Y) \hspace{-0.01cm}- \hspace{-0.01cm}h(N) \hspace{-0.01cm}=\hspace{-0.05cm}
 
:$$I(X;Y) \hspace{-0.05cm} = \hspace{-0.01cm} h(Y) \hspace{-0.01cm}- \hspace{-0.01cm}h(N) \hspace{-0.01cm}=\hspace{-0.05cm}
Line 126: Line 132:
  
 
   
 
   
==Kanalkapazität des AWGN–Kanals==   
+
==Channel capacity of the AWGN channel==   
 
<br>
 
<br>
Spezifiziert man im bisherigen&nbsp;  [[Information_Theory/AWGN–Kanalkapazität_bei_wertkontinuierlichem_Eingang#Transinformationsberechnung_bei_additiver_St.C3.B6rung|allgemeinen Systemmodell]]&nbsp; die Wahrscheinlichkeitsdichtefunktion der Störung (bzw. des Rauschens) als gaußisch entsprechend
+
If one specifies the probability density function of the noise in the previous&nbsp;  [[Information_Theory/AWGN–Kanalkapazität_bei_wertkontinuierlichem_Eingang#Calculation_of_mutual_information_with_additive_noise|"general system model"]]&nbsp; as Gaussian corresponding to
[[File:P_ID2884__Inf_T_4_2_S4_neu.png|right|frame|Zur Herleitung der AWGN–Kanalkapazität]]  
+
[[File:P_ID2884__Inf_T_4_2_S4_neu.png|right|frame|Derivation of the AWGN channel capacity]]  
 +
 
 
:$$f_N(n) = \frac{1}{\sqrt{2\pi  \sigma_N^2}} \cdot {\rm e}^{  
 
:$$f_N(n) = \frac{1}{\sqrt{2\pi  \sigma_N^2}} \cdot {\rm e}^{  
 
- \hspace{0.05cm}{n^2}/(2 \sigma_N^2) } \hspace{0.05cm}, $$
 
- \hspace{0.05cm}{n^2}/(2 \sigma_N^2) } \hspace{0.05cm}, $$
  
so erhalten wir das rechts skizzierte Modell zur Berechnung der Kanalkapazität des so genannten&nbsp; [[Modulation_Methods/Qualitätskriterien#Einige_Anmerkungen_zum_AWGN.E2.80.93Kanalmodell|AWGN–Kanals]]&nbsp; (''Additive White Gaussian Noise'').&nbsp; Meist ersetzen wir im Folgenden&nbsp; $\sigma_N^2$&nbsp; durch&nbsp; $P_N$.
+
we obtain the model sketched on the right for calculating the channel capacity of the so-called&nbsp; [[Modulation_Methods/Quality_Criteria#Some_remarks_on_the_AWGN_channel_model|"AWGN channel"]] &nbsp; &rArr; &nbsp;  "Additive White Gaussian Noise").&nbsp; In the following,&nbsp; we usually replace the variance&nbsp; $\sigma_N^2$&nbsp; by the power&nbsp; $P_N$.
<br clear=all>
+
 
Aus vorherigen Abschnitten wissen wir:
+
We know from previous sections:
*Die&nbsp; [[Information_Theory/Anwendung_auf_die_Digitalsignalübertragung#Definition_und_Bedeutung_der_Kanalkapazit.C3.A4t|Kanalkapazität]]&nbsp; $C_{\rm AWGN}$&nbsp; gibt die maximale Transinformation&nbsp; $I(X; Y)$&nbsp; zwischen der Eingangsgröße&nbsp;  $X$&nbsp;  und der Ausgangsgröße&nbsp;  $Y$&nbsp;  des AWGN–Kanals an.&nbsp;  Die Maximierung bezieht sich dabei auf die bestmögliche Eingangs–WDF.&nbsp;  Somit gilt unter der Nebenbedingung der&nbsp;  [[Information_Theory/Differentielle_Entropie#Differentielle_Entropie_einiger_leistungsbegrenzter_Zufallsgr.C3.B6.C3.9Fen|Leistungsbegrenzung]]:
+
*The&nbsp; [[Information_Theory/Anwendung_auf_die_Digitalsignalübertragung#Definition_and_meaning_of_channel_capacity|"channel capacity"]]&nbsp; $C_{\rm AWGN}$&nbsp; specifies the maximum mutual information&nbsp; $I(X; Y)$&nbsp; between the input quantity&nbsp;  $X$&nbsp;  and the output quantity&nbsp;  $Y$&nbsp;  of the AWGN channel.&nbsp;   
 +
 
 +
*The maximization refers to the best possible input PDF.&nbsp;  Thus,&nbsp; under the&nbsp;  [[Information_Theory/Differentielle_Entropie#Differential_entropy_of_some_power-constrained_random_variables|"power constraint"]]&nbsp; the following applies:
 
   
 
   
 
:$$C_{\rm AWGN} = \max_{f_X:\hspace{0.1cm} {\rm E}[X^2 ] \le P_X} \hspace{-0.35cm}  I(X;Y)   
 
:$$C_{\rm AWGN} = \max_{f_X:\hspace{0.1cm} {\rm E}[X^2 ] \le P_X} \hspace{-0.35cm}  I(X;Y)   
 
= -h(N) + \max_{f_X:\hspace{0.1cm} {\rm E}[X^2] \le P_X} \hspace{-0.35cm}  h(Y)  
 
= -h(N) + \max_{f_X:\hspace{0.1cm} {\rm E}[X^2] \le P_X} \hspace{-0.35cm}  h(Y)  
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
 +
*It is already taken into account that the maximization relates solely to the differential entropy &nbsp;$h(Y)$ &nbsp; ⇒ &nbsp; probability density function &nbsp;$f_Y(y)$.&nbsp;  Indeed, for a given noise power&nbsp;  $P_N$ &nbsp; &rArr;  &nbsp; $h(N) = 1/2 · \log_2 (2π{\rm e} · P_N)$&nbsp; is a constant.
  
:Es ist bereits berücksichtigt, dass sich die Maximierung allein auf die differentielle Entropie &nbsp;$h(Y)$ &nbsp; ⇒ &nbsp; WDF &nbsp;$f_Y(y)$&nbsp; bezieht.&nbsp;  Bei gegebener Störleistung&nbsp;  $P_N$&nbsp;  ist nämlich &nbsp;$h(N) = 1/2 · \log_2 (2π{\rm e} · P_N)$&nbsp; eine Konstante.
+
*The maximum for &nbsp;$h(Y)$&nbsp;  is obtained for a Gaussian PDF &nbsp;$f_Y(y)$&nbsp; with &nbsp;$P_Y = P_X + P_N$,&nbsp; see section&nbsp; [[Information_Theory/Differentielle_Entropie#Proof:_Maximum_differential_entropy_with_power_constraint|"Maximum differential entropy under power constraint"]]:
*Das Maximum für &nbsp;$h(Y)$&nbsp; erhält man für eine Gaußsche WDF &nbsp;$f_Y(y)$&nbsp; mit &nbsp;$P_Y = P_X + P_N$&nbsp;t, siehe Seite&nbsp; [[Information_Theory/Differentielle_Entropie#Beweis:_Maximale_differentielle_Entropie_bei_Leistungsbegrenzung|Maximale differentielle Entropie bei Leistungsbegrenzung]]:
 
 
:$${\rm max}\big[h(Y)\big] = 1/2 · \log_2 \big[2πe · (P_X + P_N)\big].$$
 
:$${\rm max}\big[h(Y)\big] = 1/2 · \log_2 \big[2πe · (P_X + P_N)\big].$$
*Die Ausgangs–WDF &nbsp;$f_Y(y) = f_X(x) ∗ f_N(n)$&nbsp; ist aber nur dann gaußförmig, wenn sowohl&nbsp;  $f_X(x)$&nbsp;  als auch&nbsp;  $f_N(n)$&nbsp;  Gaußfunktionen sind.&nbsp; Ein plakativer Merkspruch zur Faltungsoperation lautet nämlich:&nbsp; '''Gauß bleibt Gauß, und Nicht–Gauß wird nie (exakt) Gauß'''.
+
*However,&nbsp; the output PDF &nbsp;$f_Y(y) = f_X(x) ∗ f_N(n)$&nbsp; is Gaussian only if both&nbsp;  $f_X(x)$&nbsp;  and&nbsp;  $f_N(n)$&nbsp;  are Gaussian functions. &nbsp; A striking saying about the convolution operation is:&nbsp; '''Gaussian remains Gaussian, and non-Gaussian never becomes (exactly) Gaussian'''.
  
  
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
$\text{Fazit:}$&nbsp; Beim AWGN–Kanal &nbsp; ⇒ &nbsp; Gaußsche Rausch-WDF &nbsp;$f_N(n)$&nbsp; ergibt sich die&nbsp; ''Kanalkapazität''&nbsp; genau dann, wenn die Eingangs–WDF &nbsp;$f_X(x)$&nbsp; ''ebenfalls gaußförmig'' ist:
+
[[File:P_ID2885__Inf_T_4_2_S4b_neu.png|right|frame|Numerical results for the AWGN channel capacity as a function of&nbsp; ${P_X}/{P_N}$]] 
 +
$\text{Conclusion:}$&nbsp; For the AWGN channel &nbsp; ⇒ &nbsp;Gaussian noise PDF &nbsp;$f_N(n)$&nbsp; the&nbsp; channel capacity&nbsp; results exactly when the input PDF &nbsp;$f_X(x)$&nbsp; is also Gaussian:
  
[[File:P_ID2885__Inf_T_4_2_S4b_neu.png|right|frame|Numerische Ergebnisse für die AWGN–Kanalkapazität als Funktion von&nbsp; ${P_X}/{P_N}$]]
 
 
:$$C_{\rm AWGN} = h_{\rm max}(Y) - h(N) = 1/2 \cdot  {\rm log}_2 \hspace{0.1cm} {P_Y}/{P_N}$$
 
:$$C_{\rm AWGN} = h_{\rm max}(Y) - h(N) = 1/2 \cdot  {\rm log}_2 \hspace{0.1cm} {P_Y}/{P_N}$$
 
:$$\Rightarrow \hspace{0.3cm} C_{\rm AWGN}=  1/2 \cdot  {\rm log}_2 \hspace{0.1cm} ( 1 + P_X/P_N) \hspace{0.05cm}.$$}}
 
:$$\Rightarrow \hspace{0.3cm} C_{\rm AWGN}=  1/2 \cdot  {\rm log}_2 \hspace{0.1cm} ( 1 + P_X/P_N) \hspace{0.05cm}.$$}}
Line 157: Line 166:
 
 
 
   
 
   
==Parallele Gaußsche Kanäle ==  
+
==Parallel Gaussian channels ==  
 
<br>
 
<br>
[[File:P_ID2891__Inf_T_4_2_S4c_neu.png|frame|Parallele AWGN–Kanäle]]
+
[[File:EN_Inf_T_4_2_S4c.png|frame|Parallel AWGN channels]]
 +
We now consider  according to the graph&nbsp; $K$&nbsp; parallel Gaussian channels&nbsp; $X_1 → Y_1$,&nbsp; ... ,&nbsp;  $X_k → Y_k$,&nbsp; ... , $X_K → Y_K$.
  
Wir betrachten nun entsprechend der  Grafik&nbsp; $K$&nbsp; parallele Gaußkanäle  von&nbsp; $X_1 → Y_1$,&nbsp; ... ,&nbsp;  $X_k → Y_k$,&nbsp; ... , $X_K → Y_K$.
+
*We call the transmission powers in the&nbsp; $K$&nbsp; channels
*Die Sendeleistungen in den&nbsp; $K$&nbsp; Kanälen nennen wir
 
 
:$$P_1 = \text{E}[X_1^2], \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ P_k = \text{E}[X_k^2], \hspace{0.15cm}\text{...}\hspace{0.15cm}  ,\ P_K = \text{E}[X_K^2].$$
 
:$$P_1 = \text{E}[X_1^2], \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ P_k = \text{E}[X_k^2], \hspace{0.15cm}\text{...}\hspace{0.15cm}  ,\ P_K = \text{E}[X_K^2].$$
*Die&nbsp; $K$&nbsp; Störleistungen können ebenfalls unterschiedlich sein:
+
*The&nbsp; $K$&nbsp; noise powers can also be different:
 
:$$σ_1^2, \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ σ_k^2, \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ σ_K^2.$$  
 
:$$σ_1^2, \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ σ_k^2, \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ σ_K^2.$$  
  
 +
We are now looking for the maximum mutual information  &nbsp;$I(X_1, \hspace{0.15cm}\text{...}\hspace{0.15cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1, \hspace{0.15cm}\text{...}\hspace{0.15cm}, Y_K) $&nbsp; between
 +
*the&nbsp; $K$&nbsp; input variables&nbsp; $X_1$,&nbsp; ... , $X_K$&nbsp; and
  
Gesucht ist nun die maximale Transinformation &nbsp;$I(X_1, \hspace{0.15cm}\text{...}\hspace{0.15cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1, \hspace{0.15cm}\text{...}\hspace{0.15cm}, Y_K) $&nbsp; zwischen
+
*the&nbsp; $K$ output variables&nbsp; $Y_1$&nbsp;, ... , $Y_K$,
*den&nbsp; $K$&nbsp; Eingangsgrößen&nbsp; $X_1$,&nbsp; ... , $X_K$&nbsp; sowie
 
*den&nbsp; $K$ Ausgangsgrößen&nbsp; $Y_1$&nbsp;, ... , $Y_K$,
 
  
  
die wir als die&nbsp; ''Gesamt–Kanalkapazität''&nbsp; dieser AWGN–Konfiguration bezeichnen.  
+
which we call the&nbsp; &raquo;'''total channel capacity'''&laquo;&nbsp; of this AWGN configuration.
 
+
<br clear=all>
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
$\text{Vereinbarung:}$&nbsp;  
+
$\text{Agreement:}$&nbsp;
Ausgegangen wird von Leistungsbegrenzung des Gesamtsystems.&nbsp; Das heißt: &nbsp; <br>&nbsp; &nbsp; Die Summe aller Leistungen&nbsp; $P_k$&nbsp; in den&nbsp; $K$&nbsp; Einzelkanälen darf den vorgegebenen Wert&nbsp; $P_X$&nbsp; nicht überschreiten:
+
 +
Assume power constraint of the total AWGN system.&nbsp; That is: &nbsp; The sum of all powers&nbsp; $P_k$&nbsp; in the&nbsp; $K$&nbsp; individual channels must not exceed the specified value&nbsp; $P_X$&nbsp;:
 
   
 
   
 
:$$P_1 + \hspace{0.05cm}\text{...}\hspace{0.05cm}+ P_K = \hspace{0.1cm} \sum_{k= 1}^K  
 
:$$P_1 + \hspace{0.05cm}\text{...}\hspace{0.05cm}+ P_K = \hspace{0.1cm} \sum_{k= 1}^K  
Line 183: Line 193:
  
  
Unter der nur wenig einschränkenden Annahme unabhängiger Störquellen&nbsp; $N_1$,&nbsp; ... ,&nbsp; $N_K$&nbsp; kann für die Transinformation nach einigen Zwischenschritten geschrieben werden:
+
Under the only slightly restrictive assumption of independent noise sources&nbsp; $N_1$,&nbsp; ... ,&nbsp; $N_K$&nbsp; it can be written for the mutual information after some intermediate steps:
 
   
 
   
 
:$$I(X_1, \hspace{0.05cm}\text{...}\hspace{0.05cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1,\hspace{0.05cm}\text{...}\hspace{0.05cm}, Y_K) = h(Y_1, ... \hspace{0.05cm}, Y_K ) - \hspace{0.1cm} \sum_{k= 1}^K  
 
:$$I(X_1, \hspace{0.05cm}\text{...}\hspace{0.05cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1,\hspace{0.05cm}\text{...}\hspace{0.05cm}, Y_K) = h(Y_1, ... \hspace{0.05cm}, Y_K ) - \hspace{0.1cm} \sum_{k= 1}^K  
 
  \hspace{0.1cm} h(N_k)\hspace{0.05cm}.$$
 
  \hspace{0.1cm} h(N_k)\hspace{0.05cm}.$$
  
Dafür istn folgende obere Schranke angebbar:
+
*The following upper bound can be specified for this:
 
   
 
   
 
:$$I(X_1,\hspace{0.05cm}\text{...}\hspace{0.05cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1, \hspace{0.05cm}\text{...} \hspace{0.05cm}, Y_K)  
 
:$$I(X_1,\hspace{0.05cm}\text{...}\hspace{0.05cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1, \hspace{0.05cm}\text{...} \hspace{0.05cm}, Y_K)  
\hspace{0.2cm} \le \hspace{0.1cm} \hspace{0.1cm} \sum_{k= 1}^K  \hspace{0.1cm} \big[h(Y_k - h(N_k)\big]
+
\hspace{0.2cm} \le \hspace{0.1cm} \hspace{0.1cm} \sum_{k= 1}^K  \hspace{0.1cm} \big[h(Y_k) - h(N_k)\big]
 
\hspace{0.2cm} \le \hspace{0.1cm} 1/2 \cdot \sum_{k= 1}^K  \hspace{0.1cm} {\rm log}_2 \hspace{0.1cm} ( 1 + {P_k}/{\sigma_k^2})
 
\hspace{0.2cm} \le \hspace{0.1cm} 1/2 \cdot \sum_{k= 1}^K  \hspace{0.1cm} {\rm log}_2 \hspace{0.1cm} ( 1 + {P_k}/{\sigma_k^2})
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
*Das Gleichheitszeichen (Identität) gilt bei mittelwertfreien Gaußschen Eingangsgrößen&nbsp; $X_k$&nbsp; sowie bei statistisch voneinander unabhängigen Störungen&nbsp; $N_k$.
+
#The equal sign&nbsp; (identity)&nbsp; is valid for mean-free Gaussian input variables&nbsp; $X_k$&nbsp; as well as for statistically independent disturbances&nbsp; $N_k$.
*Man kommt von dieser Gleichung zur&nbsp; ''maximalen Transinformation'' &nbsp;  ⇒ &nbsp;  ''Kanalkapazität'', wenn man die gesamte Sendeleistung&nbsp; $P_X$&nbsp; unter Berücksichtigung der unterschiedlichen Störungen in den einzelnen Kanälen &nbsp;$(σ_k^2)$&nbsp; bestmöglich aufteilt.
+
#One arrives from this equation at the&nbsp; "maximum mutual information" &nbsp;  ⇒ &nbsp;  "channel capacity",&nbsp;  if the total transmission power&nbsp; $P_X$&nbsp; is divided as best as possible,&nbsp; taking into account the different noise powers in the individual channels &nbsp;$(σ_k^2)$.
*Dieses Optimierungsproblem lässt sich wieder mit dem Verfahren der&nbsp; [https://de.wikipedia.org/wiki/Lagrange-Multiplikator Lagrange–Multiplikatoren]&nbsp; elegant lösen.&nbsp; Das folgende Beispiel erläutert nur das Ergebnis.
+
#This optimization problem can again be elegantly solved with the method of&nbsp; [https://en.wikipedia.org/wiki/Lagrange_multiplier "Lagrange multipliers"].&nbsp; The following example only explains the result.
  
  
[[File:P_ID2894__Inf_T_4_2_S4d.png|right|frame|Bestmögliche Leistungsaufteilung für&nbsp; $K = 4$&nbsp; („Water–Filling”)]]
 
 
{{GraueBox|TEXT=
 
{{GraueBox|TEXT=
$\text{Beispiel 1:}$&nbsp; Wir betrachten&nbsp; $K = 4$&nbsp; parallele Gaußkanäle mit vier unterschiedlichen Störleistungen&nbsp; $σ_1^2$,&nbsp; ... ,&nbsp; $σ_4^2$&nbsp; gemäß der nebenstehenden Abbildung (schwach&ndash;grüne Hinterlegung).  
+
[[File:EN_Inf_T_4_2_S4d_v2.png|right|frame|Best possible power allocation for&nbsp; $K = 4$&nbsp; $($"Water–Filling"$)$]]
*Gesucht ist die bestmögliche Aufteilung der Sendeleistung auf die vier Kanäle.
+
$\text{Example 1:}$&nbsp; We consider&nbsp; $K = 4$&nbsp; parallel Gaussian channels with four different noise powers&nbsp; $σ_1^2$,&nbsp; ... ,&nbsp; $σ_4^2$&nbsp; according to the adjacent figure (faint green background).  
*Würde man dieses Profil langsam mit Wasser auffüllen, so würde das Wasser zunächst nur in den&nbsp; $\text{Kanal 2}$&nbsp; fließen.  
+
*The best possible allocation of the transmission power among the four channels is sought.
*Gießt man weiter, so sammelt sich auch im&nbsp; $\text{Kanal 1}$&nbsp; etwas Wasser an und später auch im&nbsp; $\text{Kanal 4}$.
+
 
 +
*If one were to slowly fill this profile with water,&nbsp; the water would initially flow only into&nbsp; $\text{channel 2}$.
 +
 +
*If you continue to pour,&nbsp; some water will also accumulate in&nbsp; $\text{channel 1}$&nbsp; and later also in&nbsp; $\text{channel 4}$.
 +
 
  
 +
The drawn&nbsp; "water level"&nbsp; $H$&nbsp; describes exactly the point in time when the sum &nbsp;$P_1 + P_2 + P_4$&nbsp; corresponds to the total available transmssion power&nbsp; $P_X$&nbsp; :
 +
*The optimal power allocation for this example results in &nbsp;$P_2 > P_1 > P_4$&nbsp; as well as &nbsp;$P_3 = 0$.
  
Die eingezeichnete „Wasserhöhe”&nbsp; $H$&nbsp; beschreibt genau den Zeitpunkt, zu dem die Summe &nbsp;$P_1 + P_2 + P_4$&nbsp; der insgesamt zur Verfügung stehenden Sendeleistung&nbsp; $P_X$&nbsp; entspricht:
+
*Only with a larger transmission power&nbsp; $P_X$,&nbsp; a small power&nbsp; $P_3$&nbsp; would also be allocated to the third channel.
*Die optimale Leistungsaufteilung für dieses Beispiel ergibt &nbsp;$P_2 > P_1 > P_4$&nbsp; sowie &nbsp;$P_3 = 0$.
 
*Erst bei größerer Sendeleistung&nbsp; $P_X$&nbsp; würde auch dem dritten Kanal eine kleine Leistung&nbsp; $P_3$&nbsp; zugewiesen.
 
  
  
Man bezeichnet dieses Allokationsverfahren als '''Water–Filling–Algorithmus'''.}}
+
This allocation procedure is called a&nbsp; &raquo;'''Water–Filling algorithm'''&laquo;.}}
  
  
 
{{GraueBox|TEXT=
 
{{GraueBox|TEXT=
$\text{Beispiel 2:}$&nbsp;  
+
$\text{Example 2:}$&nbsp;  
Werden alle&nbsp; $K$&nbsp; Gaußkanäle in gleicher Weise gestört &nbsp; ⇒ &nbsp; $σ_1^2 = \hspace{0.15cm}\text{...}\hspace{0.15cm} = σ_K^2 = P_N$, so sollte man natürlich die gesamte zur Verfügung stehende Sendeleistung&nbsp; $P_X$&nbsp; gleichmäßig auf alle Kanäle verteilen: &nbsp; $P_k = P_X/K$.&nbsp; Für die Gesamtkapazität erhält man dann:  
+
If all&nbsp; $K$&nbsp; Gaussian channels are equally disturbed &nbsp; ⇒ &nbsp; $σ_1^2 = \hspace{0.15cm}\text{...}\hspace{0.15cm} = σ_K^2 = P_N$,&nbsp; one should naturally allocate the total available transmission power&nbsp; $P_X$&nbsp; equally to all channels: &nbsp; $P_k = P_X/K$.&nbsp; For the total capacity we then obtain:
[[File:P_ID2939__Inf_T_4_2_S5_neu.png|right|frame|Kapazität für&nbsp; $K$&nbsp; parallele Kanäle]]
+
[[File:EN_Inf_Z_4_1.png|right|frame|Capacity for&nbsp; $K$&nbsp; parallel channels]]
:$$C_{\rm Gesamt}  
+
:$$C_{\rm total}  
 
= \frac{ K}{2} \cdot  {\rm log}_2 \hspace{0.1cm} ( 1 + \frac{P_X}{K \cdot P_N})  
 
= \frac{ K}{2} \cdot  {\rm log}_2 \hspace{0.1cm} ( 1 + \frac{P_X}{K \cdot P_N})  
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
Die Grafik zeigt die Gesamtkapazität als Funktion von&nbsp; $P_X/P_N$&nbsp; für&nbsp; $K = 1$,&nbsp; $K = 2$&nbsp; und&nbsp; $K = 3$:
+
The graph shows the total capacity as a function of&nbsp; $P_X/P_N$&nbsp; for&nbsp; $K = 1$,&nbsp; $K = 2$&nbsp; and&nbsp; $K = 3$:
*Bei &nbsp;$P_X/P_N = 10  \ ⇒ \  10 · \text{lg} (P_X/P_N) = 10 \ \text{dB}$&nbsp; wird die Gesamtkapazität um ca.&nbsp; $50\%$&nbsp; größer, wenn man die Gesamtleistung&nbsp; $P_X$&nbsp; auf zwei Kanäle gleichmäßig aufteilt: &nbsp; $P_1 = P_2 = P_X/2$.
+
*With &nbsp;$P_X/P_N = 10  \ ⇒ \  10 · \text{lg} (P_X/P_N) = 10 \ \text{dB}$&nbsp; and &nbsp; $K = 2$,&nbsp; the total capacitance becomes approximately&nbsp; $50\%$&nbsp; larger if the total power&nbsp; $P_X$&nbsp; is divided equally between two channels: &nbsp; $P_1 = P_2 = P_X/2$.
*Im Grenzfall &nbsp;$P_X/P_N → ∞$&nbsp; nimmt die Gesamtkapazität um den Faktor&nbsp; $K$&nbsp; zu &nbsp; ⇒  &nbsp; Verdoppelung bei $K = 2$.
+
 
 +
*In the borderline case &nbsp;$P_X/P_N → ∞$,&nbsp; the total capacity increases by a factor&nbsp; $K$ &nbsp; &nbsp; doubling at&nbsp; $K = 2$.
  
  
Die beiden identischen und voneinander unabhängigen Kanäle kann man auf unterschiedliche Weise realisieren, zum Beispiel durch Multiplexverfahren in Zeit, Frequenz oder Raum.
+
The two identical and independent channels can be realized in different ways,&nbsp; for example by multiplexing in time,&nbsp; frequency or space.
  
Der Fall&nbsp; $K = 2$&nbsp; lässt sich aber auch durch die Verwendung orthogonaler Basisfunktionen wie „Cosinus” und „Sinus” verwirklichen wie zum Beispiel bei
+
However,&nbsp; the case&nbsp; $K = 2$&nbsp; can also be realized by using orthogonal basis functions such as&nbsp; "cosine"&nbsp; and&nbsp; "sine"&nbsp; as for example with
*der&nbsp; [[Modulation_Methods/Quadratur–Amplitudenmodulation|Quadratur–Amplitudenmodulation]]&nbsp; (QAM) oder
+
 
*einer&nbsp; [[Modulation_Methods/Quadratur–Amplitudenmodulation#Weitere_Signalraumkonstellationen|mehrstufigen Phasenmodulation]]&nbsp; wie QPSK oder 8–PSK.}}
+
*&nbsp; [[Modulation_Methods/Quadratur–Amplitudenmodulation|"quadrature amplitude modulation"]]&nbsp; $\rm (QAM)$&nbsp; or
 +
 +
*&nbsp; [[Modulation_Methods/Quadrature_Amplitude_Modulation#Other_signal_space_constellations|"multi-level phase modulation"]]&nbsp; such as&nbsp; $\rm QPSK$&nbsp; or&nbsp; $\rm  8–PSK$.}}
  
==Aufgaben zum Kapitel ==
+
==Exercises for the chapter ==
 
<br>  
 
<br>  
[[Aufgaben:4.5 Transinformation aus 2D-WDF|Aufgabe 4.5: Transinformation aus 2D-WDF]]
+
[[Aufgaben:Exercise_4.5:_Mutual_Information_from_2D-PDF|Exercise 4.5: Mutual Information from 2D-PDF]]
  
[[Aufgaben:4.5Z Nochmals Transinformation|Aufgabe 4.5Z: Nochmals Transinformation]]
+
[[Aufgaben:Exercise_4.5Z:_Again_Mutual_Information|Exercise 4.5Z: Again Mutual Information]]
  
[[Aufgaben:4.6 AWGN–Kanalkapazität|Aufgabe 4.6: AWGN–Kanalkapazität]]
+
[[Aufgaben:Exercise_4.6:_AWGN_Channel_Capacity|Exercise 4.6: AWGN Channel Capacity]]
  
[[Aufgaben:4.7 Mehrere parallele Gaußkanäle|Aufgabe 4.7: Mehrere parallele Gaußkanäle]]
+
[[Aufgaben:Exercise_4.7:_Several_Parallel_Gaussian_Channels|Exercise 4.7: Several Parallel Gaussian Channels]]
  
[[Aufgaben:4.7Z Zum Water–Filling–Algorithmus|Aufgabe 4.7Z: Zum Water–Filling–Algorithmus]]
+
[[Aufgaben:Exercise_4.7Z:_About_the_Water_Filling_Algorithm|Exercise 4.7Z: About the Water Filling Algorithm]]
  
  
 
{{Display}}
 
{{Display}}

Latest revision as of 16:22, 28 February 2023


Mutual information between continuous random variables


In the chapter  "Information-theoretical model of digital signal transmission"  the  "mutual information"  between the two discrete random variables  $X$  and  $Y$  was given, among other things,  in the following form:

$$I(X;Y) = \hspace{0.5cm} \sum_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\sum_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})} \hspace{-0.9cm} P_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{ P_{XY}(x, y)}{P_{X}(x) \cdot P_{Y}(y)} \hspace{0.05cm}.$$

This equation simultaneously corresponds to the  "Kullback–Leibler distance"  between the joint probability function  $P_{XY}$  and the product of the two individual probability functions  $P_X$  and  $P_Y$:

$$I(X;Y) = D(P_{XY} \hspace{0.05cm} || \hspace{0.05cm}P_{X} \cdot P_{Y}) \hspace{0.05cm}.$$

In order to derive the mutual information  $I(X; Y)$  between two continuous random variables  $X$  and  $Y$,  one proceeds as follows,  whereby inverted commas indicate a quantized variable:

  • One quantizes the random variables  $X$  and  $Y$  $($with the quantization intervals  ${\it Δ}x$  and  ${\it Δ}y)$  and thus obtains the probability functions  $P_{X\hspace{0.01cm}′}$  and  $P_{Y\hspace{0.01cm}′}$.
  • The "vectors"  $P_{X\hspace{0.01cm}′}$  and  $P_{Y\hspace{0.01cm}′}$  become infinitely long after the boundary transitions  ${\it Δ}x → 0,\hspace{0.1cm} {\it Δ}y → 0$,  and the joint PMF  $P_{X\hspace{0.01cm}′\hspace{0.08cm}Y\hspace{0.01cm}′}$  is also infinitely extended in area.
  • These boundary transitions give rise to the probability density functions of the continuous random variables according to the following equations:
$$f_X(x_{\mu}) = \frac{P_{X\hspace{0.01cm}'}(x_{\mu})}{\it \Delta_x} \hspace{0.05cm}, \hspace{0.3cm}f_Y(y_{\mu}) = \frac{P_{Y\hspace{0.01cm}'}(y_{\mu})}{\it \Delta_y} \hspace{0.05cm}, \hspace{0.3cm}f_{XY}(x_{\mu}\hspace{0.05cm}, y_{\mu}) = \frac{P_{X\hspace{0.01cm}'\hspace{0.03cm}Y\hspace{0.01cm}'}(x_{\mu}\hspace{0.05cm}, y_{\mu})} {{\it \Delta_x} \cdot {\it \Delta_y}} \hspace{0.05cm}.$$
  • The double sum in the above equation, after renaming  $Δx → {\rm d}x$  and  $Δy → {\rm d}y$,  becomes the equation valid for continuous value random variables:
$$I(X;Y) = \hspace{0.5cm} \int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})} \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{ f_{XY}(x, y) } {f_{X}(x) \cdot f_{Y}(y)} \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y \hspace{0.05cm}.$$

$\text{Conclusion:}$  By splitting this double integral,  it is also possible to write for the  »mutual information«:

$$I(X;Y) = h(X) + h(Y) - h(XY)\hspace{0.05cm}.$$

The  »joint differential entropy«

$$h(XY) = - \hspace{-0.3cm}\int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})} \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \hspace{0.1cm} \big[f_{XY}(x, y) \big] \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y$$

and the two  »differential single entropies«

$$h(X) = -\hspace{-0.7cm} \int\limits_{x \hspace{0.05cm}\in \hspace{0.05cm}{\rm supp}\hspace{0.03cm} (\hspace{-0.03cm}f_X)} \hspace{-0.35cm} f_X(x) \cdot {\rm log} \hspace{0.1cm} \big[f_X(x)\big] \hspace{0.1cm}{\rm d}x \hspace{0.05cm},\hspace{0.5cm} h(Y) = -\hspace{-0.7cm} \int\limits_{y \hspace{0.05cm}\in \hspace{0.05cm}{\rm supp}\hspace{0.03cm} (\hspace{-0.03cm}f_Y)} \hspace{-0.35cm} f_Y(y) \cdot {\rm log} \hspace{0.1cm} \big[f_Y(y)\big] \hspace{0.1cm}{\rm d}y \hspace{0.05cm}.$$

On equivocation and irrelevance


We further assume the continuous mutual information  $I(X;Y) = h(X) + h(Y) - h(XY)$.   This representation is also found in the following diagram  $($left graph$)$.

Representation of the mutual information for continuous-valued random variables

From this you can see that the mutual information can also be represented as follows:

$$I(X;Y) = h(Y) - h(Y \hspace{-0.1cm}\mid \hspace{-0.1cm} X) =h(X) - h(X \hspace{-0.1cm}\mid \hspace{-0.1cm} Y)\hspace{0.05cm}.$$

These fundamental information-theoretical relationships can also be read from the graph on the right. 

⇒   This directional representation is particularly suitable for communication systems.  The outflowing or inflowing differential entropy characterises

  • the  »equivocation«:
$$h(X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y) = - \hspace{-0.3cm}\int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})} \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \hspace{0.1cm} \big [{f_{\hspace{0.03cm}X \mid \hspace{0.03cm} Y} (x \hspace{-0.05cm}\mid \hspace{-0.05cm} y)} \big] \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y,$$
  • the  »irrelevance«:
$$h(Y \hspace{-0.05cm}\mid \hspace{-0.05cm} X) = - \hspace{-0.3cm}\int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})} \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \hspace{0.1cm} \big [{f_{\hspace{0.03cm}Y \mid \hspace{0.03cm} X} (y \hspace{-0.05cm}\mid \hspace{-0.05cm} x)} \big] \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y.$$

The significance of these two information-theoretic quantities will be discussed in more detail in  $\text{Exercise 4.5Z}$ .

If one compares the graphical representations of the mutual information for

  • continuous random variables according to the above diagram,


the only distinguishing feature is that each  $($capital$)$  $H$  $($entropy;  $\ge 0)$  has been replaced by a  $($non-capital$)$ $h$  $($differential entropy;  can be positive,  negative or zero$)$.

  • Otherwise,  the mutual information is the same in both representations and  $I(X; Y) ≥ 0$  always applies.
  • In the following,  we mostly use the  "binary logarithm"   ⇒   $\log_2$  and thus obtain the mutual information with the pseudo-unit  "bit".


Calculation of mutual information with additive noise


We now consider a very simple model of message transmission:

  • The random variable  $X$  stands for the  $($zero mean$)$  transmitted signal and is characterized by PDF  $f_X(x)$  and variance  $σ_X^2$.  Transmission power:  $P_X = σ_X^2$.
  • The additive noise  $N$  is given by the  $($mean-free$)$  PDF  $f_N(n)$  and the noise power  $P_N = σ_N^2$.
  • If  $X$  and  $N$  are assumed to be statistically independent   ⇒   signal-independent noise, then  $\text{E}\big[X · N \big] = \text{E}\big[X \big] · \text{E}\big[N\big] = 0$ .
Transmission system with additive noise
  • The received signal is  $Y = X + N$.  The output PDF  $f_Y(y)$  can be calculated with the "convolution operation"    ⇒   $f_Y(y) = f_X(x) ∗ f_N(n)$.
  • For the received power holds:
$$P_Y = \sigma_Y^2 = {\rm E}\big[Y^2\big] = {\rm E}\big[(X+N)^2\big] = {\rm E}\big[X^2\big] + {\rm E}\big[N^2\big] = \sigma_X^2 + \sigma_N^2 $$
$$\Rightarrow \hspace{0.3cm} P_Y = P_X + P_N \hspace{0.05cm}.$$

The sketched probability density functions  $($rectangular or trapezoidal$)$  are only intended to clarify the calculation process and have no practical relevance.

To calculate the mutual information between input  $X$  and output  $Y$  there are three possibilities according to the  "graphic in the previous subchapter":

  • Calculation according to  $I(X, Y) = h(X) + h(Y) - h(XY)$:
The first two terms can be calculated in a simple way from  $f_X(x)$  and  $f_Y(y)$  respectively.  The  "joint differential entropy"  $h(XY)$ is problematic.  For this,  one needs the two-dimensional joint PDF  $f_{XY}(x, y)$,  which is usually not given directly.
  • Calculation according to  $I(X, Y) = h(Y) - h(Y|X)$:
Here  $h(Y|X)$  denotes the  "differential irrelevance".  It holds  $h(Y|X) = h(X + N|X) = h(N)$,  so that  $I(X; Y)$  is very easy to calculate via the equation  $f_Y(y) = f_X(x) ∗ f_N(n)$  if $f_X(x)$  and $f_N(n)$  are known.
  • Calculation according to  $I(X, Y) = h(X) - h(X|Y)$:
According to this equation,  however,  one needs the   "differential equivocation"   $h(X|Y)$,  which is more difficult to state than  $h(Y|X)$.

$\text{Conclusion:}$  In the following we use the middle equation and write for the  »mutual information«  between the input  $X$  and the output  $Y$  of a  transmission system in the presence of additive and uncorrelated noise  $N$:

$$I(X;Y) \hspace{-0.05cm} = \hspace{-0.01cm} h(Y) \hspace{-0.01cm}- \hspace{-0.01cm}h(N) \hspace{-0.01cm}=\hspace{-0.05cm} -\hspace{-0.7cm} \int\limits_{y \hspace{0.05cm}\in \hspace{0.05cm}{\rm supp}(f_Y)} \hspace{-0.65cm} f_Y(y) \cdot {\rm log} \hspace{0.1cm} \big[f_Y(y)\big] \hspace{0.1cm}{\rm d}y +\hspace{-0.7cm} \int\limits_{n \hspace{0.05cm}\in \hspace{0.05cm}{\rm supp}(f_N)} \hspace{-0.65cm} f_N(n) \cdot {\rm log} \hspace{0.1cm} \big[f_N(n)\big] \hspace{0.1cm}{\rm d}n\hspace{0.05cm}.$$


Channel capacity of the AWGN channel


If one specifies the probability density function of the noise in the previous  "general system model"  as Gaussian corresponding to

Derivation of the AWGN channel capacity
$$f_N(n) = \frac{1}{\sqrt{2\pi \sigma_N^2}} \cdot {\rm e}^{ - \hspace{0.05cm}{n^2}/(2 \sigma_N^2) } \hspace{0.05cm}, $$

we obtain the model sketched on the right for calculating the channel capacity of the so-called  "AWGN channel"   ⇒   "Additive White Gaussian Noise").  In the following,  we usually replace the variance  $\sigma_N^2$  by the power  $P_N$.

We know from previous sections:

  • The  "channel capacity"  $C_{\rm AWGN}$  specifies the maximum mutual information  $I(X; Y)$  between the input quantity  $X$  and the output quantity  $Y$  of the AWGN channel. 
  • The maximization refers to the best possible input PDF.  Thus,  under the  "power constraint"  the following applies:
$$C_{\rm AWGN} = \max_{f_X:\hspace{0.1cm} {\rm E}[X^2 ] \le P_X} \hspace{-0.35cm} I(X;Y) = -h(N) + \max_{f_X:\hspace{0.1cm} {\rm E}[X^2] \le P_X} \hspace{-0.35cm} h(Y) \hspace{0.05cm}.$$
  • It is already taken into account that the maximization relates solely to the differential entropy  $h(Y)$   ⇒   probability density function  $f_Y(y)$.  Indeed, for a given noise power  $P_N$   ⇒   $h(N) = 1/2 · \log_2 (2π{\rm e} · P_N)$  is a constant.
$${\rm max}\big[h(Y)\big] = 1/2 · \log_2 \big[2πe · (P_X + P_N)\big].$$
  • However,  the output PDF  $f_Y(y) = f_X(x) ∗ f_N(n)$  is Gaussian only if both  $f_X(x)$  and  $f_N(n)$  are Gaussian functions.   A striking saying about the convolution operation is:  Gaussian remains Gaussian, and non-Gaussian never becomes (exactly) Gaussian.


Numerical results for the AWGN channel capacity as a function of  ${P_X}/{P_N}$

$\text{Conclusion:}$  For the AWGN channel   ⇒  Gaussian noise PDF  $f_N(n)$  the  channel capacity  results exactly when the input PDF  $f_X(x)$  is also Gaussian:

$$C_{\rm AWGN} = h_{\rm max}(Y) - h(N) = 1/2 \cdot {\rm log}_2 \hspace{0.1cm} {P_Y}/{P_N}$$
$$\Rightarrow \hspace{0.3cm} C_{\rm AWGN}= 1/2 \cdot {\rm log}_2 \hspace{0.1cm} ( 1 + P_X/P_N) \hspace{0.05cm}.$$


Parallel Gaussian channels


Parallel AWGN channels

We now consider according to the graph  $K$  parallel Gaussian channels  $X_1 → Y_1$,  ... ,  $X_k → Y_k$,  ... , $X_K → Y_K$.

  • We call the transmission powers in the  $K$  channels
$$P_1 = \text{E}[X_1^2], \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ P_k = \text{E}[X_k^2], \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ P_K = \text{E}[X_K^2].$$
  • The  $K$  noise powers can also be different:
$$σ_1^2, \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ σ_k^2, \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ σ_K^2.$$

We are now looking for the maximum mutual information  $I(X_1, \hspace{0.15cm}\text{...}\hspace{0.15cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1, \hspace{0.15cm}\text{...}\hspace{0.15cm}, Y_K) $  between

  • the  $K$  input variables  $X_1$,  ... , $X_K$  and
  • the  $K$ output variables  $Y_1$ , ... , $Y_K$,


which we call the  »total channel capacity«  of this AWGN configuration.

$\text{Agreement:}$ 

Assume power constraint of the total AWGN system.  That is:   The sum of all powers  $P_k$  in the  $K$  individual channels must not exceed the specified value  $P_X$ :

$$P_1 + \hspace{0.05cm}\text{...}\hspace{0.05cm}+ P_K = \hspace{0.1cm} \sum_{k= 1}^K \hspace{0.1cm}{\rm E} \left [ X_k^2\right ] \le P_{X} \hspace{0.05cm}.$$


Under the only slightly restrictive assumption of independent noise sources  $N_1$,  ... ,  $N_K$  it can be written for the mutual information after some intermediate steps:

$$I(X_1, \hspace{0.05cm}\text{...}\hspace{0.05cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1,\hspace{0.05cm}\text{...}\hspace{0.05cm}, Y_K) = h(Y_1, ... \hspace{0.05cm}, Y_K ) - \hspace{0.1cm} \sum_{k= 1}^K \hspace{0.1cm} h(N_k)\hspace{0.05cm}.$$
  • The following upper bound can be specified for this:
$$I(X_1,\hspace{0.05cm}\text{...}\hspace{0.05cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1, \hspace{0.05cm}\text{...} \hspace{0.05cm}, Y_K) \hspace{0.2cm} \le \hspace{0.1cm} \hspace{0.1cm} \sum_{k= 1}^K \hspace{0.1cm} \big[h(Y_k) - h(N_k)\big] \hspace{0.2cm} \le \hspace{0.1cm} 1/2 \cdot \sum_{k= 1}^K \hspace{0.1cm} {\rm log}_2 \hspace{0.1cm} ( 1 + {P_k}/{\sigma_k^2}) \hspace{0.05cm}.$$
  1. The equal sign  (identity)  is valid for mean-free Gaussian input variables  $X_k$  as well as for statistically independent disturbances  $N_k$.
  2. One arrives from this equation at the  "maximum mutual information"   ⇒   "channel capacity",  if the total transmission power  $P_X$  is divided as best as possible,  taking into account the different noise powers in the individual channels  $(σ_k^2)$.
  3. This optimization problem can again be elegantly solved with the method of  "Lagrange multipliers".  The following example only explains the result.


Best possible power allocation for  $K = 4$  $($"Water–Filling"$)$

$\text{Example 1:}$  We consider  $K = 4$  parallel Gaussian channels with four different noise powers  $σ_1^2$,  ... ,  $σ_4^2$  according to the adjacent figure (faint green background).

  • The best possible allocation of the transmission power among the four channels is sought.
  • If one were to slowly fill this profile with water,  the water would initially flow only into  $\text{channel 2}$.
  • If you continue to pour,  some water will also accumulate in  $\text{channel 1}$  and later also in  $\text{channel 4}$.


The drawn  "water level"  $H$  describes exactly the point in time when the sum  $P_1 + P_2 + P_4$  corresponds to the total available transmssion power  $P_X$  :

  • The optimal power allocation for this example results in  $P_2 > P_1 > P_4$  as well as  $P_3 = 0$.
  • Only with a larger transmission power  $P_X$,  a small power  $P_3$  would also be allocated to the third channel.


This allocation procedure is called a  »Water–Filling algorithm«.


$\text{Example 2:}$  If all  $K$  Gaussian channels are equally disturbed   ⇒   $σ_1^2 = \hspace{0.15cm}\text{...}\hspace{0.15cm} = σ_K^2 = P_N$,  one should naturally allocate the total available transmission power  $P_X$  equally to all channels:   $P_k = P_X/K$.  For the total capacity we then obtain:

Capacity for  $K$  parallel channels
$$C_{\rm total} = \frac{ K}{2} \cdot {\rm log}_2 \hspace{0.1cm} ( 1 + \frac{P_X}{K \cdot P_N}) \hspace{0.05cm}.$$

The graph shows the total capacity as a function of  $P_X/P_N$  for  $K = 1$,  $K = 2$  and  $K = 3$:

  • With  $P_X/P_N = 10 \ ⇒ \ 10 · \text{lg} (P_X/P_N) = 10 \ \text{dB}$  and   $K = 2$,  the total capacitance becomes approximately  $50\%$  larger if the total power  $P_X$  is divided equally between two channels:   $P_1 = P_2 = P_X/2$.
  • In the borderline case  $P_X/P_N → ∞$,  the total capacity increases by a factor  $K$   ⇒   doubling at  $K = 2$.


The two identical and independent channels can be realized in different ways,  for example by multiplexing in time,  frequency or space.

However,  the case  $K = 2$  can also be realized by using orthogonal basis functions such as  "cosine"  and  "sine"  as for example with

Exercises for the chapter


Exercise 4.5: Mutual Information from 2D-PDF

Exercise 4.5Z: Again Mutual Information

Exercise 4.6: AWGN Channel Capacity

Exercise 4.7: Several Parallel Gaussian Channels

Exercise 4.7Z: About the Water Filling Algorithm