Difference between revisions of "Information Theory/AWGN Channel Capacity for Continuous-Valued Input"

Latest revision as of 16:22, 28 February 2023

Mutual information between continuous random variables

In the chapter "Information-theoretical model of digital signal transmission" the "mutual information" between the two discrete random variables $X$ and $Y$ was given, among other things, in the following form:

$$I(X;Y) = \hspace{0.5cm} \sum_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\sum_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})} \hspace{-0.9cm} P_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{ P_{XY}(x, y)}{P_{X}(x) \cdot P_{Y}(y)} \hspace{0.05cm}.$$

This equation simultaneously corresponds to the "Kullback–Leibler distance" between the joint probability function $P_{XY}$ and the product of the two individual probability functions $P_X$ and $P_Y$:

$$I(X;Y) = D(P_{XY} \hspace{0.05cm} || \hspace{0.05cm}P_{X} \cdot P_{Y}) \hspace{0.05cm}.$$

In order to derive the mutual information $I(X; Y)$ between two continuous random variables $X$ and $Y$, one proceeds as follows, whereby inverted commas indicate a quantized variable:

One quantizes the random variables $X$ and $Y$ $($with the quantization intervals ${\it Δ}x$ and ${\it Δ}y)$ and thus obtains the probability functions $P_{X\hspace{0.01cm}′}$ and $P_{Y\hspace{0.01cm}′}$.

The "vectors" $P_{X\hspace{0.01cm}′}$ and $P_{Y\hspace{0.01cm}′}$ become infinitely long after the boundary transitions ${\it Δ}x → 0,\hspace{0.1cm} {\it Δ}y → 0$, and the joint PMF $P_{X\hspace{0.01cm}′\hspace{0.08cm}Y\hspace{0.01cm}′}$ is also infinitely extended in area.

These boundary transitions give rise to the probability density functions of the continuous random variables according to the following equations:

$$f_X(x_{\mu}) = \frac{P_{X\hspace{0.01cm}'}(x_{\mu})}{\it \Delta_x} \hspace{0.05cm}, \hspace{0.3cm}f_Y(y_{\mu}) = \frac{P_{Y\hspace{0.01cm}'}(y_{\mu})}{\it \Delta_y} \hspace{0.05cm}, \hspace{0.3cm}f_{XY}(x_{\mu}\hspace{0.05cm}, y_{\mu}) = \frac{P_{X\hspace{0.01cm}'\hspace{0.03cm}Y\hspace{0.01cm}'}(x_{\mu}\hspace{0.05cm}, y_{\mu})} {{\it \Delta_x} \cdot {\it \Delta_y}} \hspace{0.05cm}.$$

The double sum in the above equation, after renaming $Δx → {\rm d}x$ and $Δy → {\rm d}y$, becomes the equation valid for continuous value random variables:

$$I(X;Y) = \hspace{0.5cm} \int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})} \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{ f_{XY}(x, y) } {f_{X}(x) \cdot f_{Y}(y)} \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y \hspace{0.05cm}.$$

$\text{Conclusion:}$ By splitting this double integral, it is also possible to write for the »mutual information«:

$$I(X;Y) = h(X) + h(Y) - h(XY)\hspace{0.05cm}.$$

The »joint differential entropy«

$$h(XY) = - \hspace{-0.3cm}\int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})} \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \hspace{0.1cm} \big[f_{XY}(x, y) \big] \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y$$

and the two »differential single entropies«

$$h(X) = -\hspace{-0.7cm} \int\limits_{x \hspace{0.05cm}\in \hspace{0.05cm}{\rm supp}\hspace{0.03cm} (\hspace{-0.03cm}f_X)} \hspace{-0.35cm} f_X(x) \cdot {\rm log} \hspace{0.1cm} \big[f_X(x)\big] \hspace{0.1cm}{\rm d}x \hspace{0.05cm},\hspace{0.5cm} h(Y) = -\hspace{-0.7cm} \int\limits_{y \hspace{0.05cm}\in \hspace{0.05cm}{\rm supp}\hspace{0.03cm} (\hspace{-0.03cm}f_Y)} \hspace{-0.35cm} f_Y(y) \cdot {\rm log} \hspace{0.1cm} \big[f_Y(y)\big] \hspace{0.1cm}{\rm d}y \hspace{0.05cm}.$$

On equivocation and irrelevance

We further assume the continuous mutual information $I(X;Y) = h(X) + h(Y) - h(XY)$. This representation is also found in the following diagram $($left graph$)$.

Representation of the mutual information for continuous-valued random variables

From this you can see that the mutual information can also be represented as follows:

$$I(X;Y) = h(Y) - h(Y \hspace{-0.1cm}\mid \hspace{-0.1cm} X) =h(X) - h(X \hspace{-0.1cm}\mid \hspace{-0.1cm} Y)\hspace{0.05cm}.$$

These fundamental information-theoretical relationships can also be read from the graph on the right.

⇒ This directional representation is particularly suitable for communication systems. The outflowing or inflowing differential entropy characterises

the »equivocation«:

$$h(X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y) = - \hspace{-0.3cm}\int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})} \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \hspace{0.1cm} \big [{f_{\hspace{0.03cm}X \mid \hspace{0.03cm} Y} (x \hspace{-0.05cm}\mid \hspace{-0.05cm} y)} \big] \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y,$$

the »irrelevance«:

$$h(Y \hspace{-0.05cm}\mid \hspace{-0.05cm} X) = - \hspace{-0.3cm}\int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})} \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \hspace{0.1cm} \big [{f_{\hspace{0.03cm}Y \mid \hspace{0.03cm} X} (y \hspace{-0.05cm}\mid \hspace{-0.05cm} x)} \big] \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y.$$

The significance of these two information-theoretic quantities will be discussed in more detail in $\text{Exercise 4.5Z}$ .

If one compares the graphical representations of the mutual information for

discrete random variables in the section "Information-theoretical model of digital signal transmission", and

continuous random variables according to the above diagram,

the only distinguishing feature is that each $($capital$)$ $H$ $($entropy; $\ge 0)$ has been replaced by a $($non-capital$)$ $h$ $($differential entropy; can be positive, negative or zero$)$.

Otherwise, the mutual information is the same in both representations and $I(X; Y) ≥ 0$ always applies.

In the following, we mostly use the "binary logarithm" ⇒ $\log_2$ and thus obtain the mutual information with the pseudo-unit "bit".

Calculation of mutual information with additive noise

We now consider a very simple model of message transmission:

The random variable $X$ stands for the $($zero mean$)$ transmitted signal and is characterized by PDF $f_X(x)$ and variance $σ_X^2$. Transmission power: $P_X = σ_X^2$.

The additive noise $N$ is given by the $($mean-free$)$ PDF $f_N(n)$ and the noise power $P_N = σ_N^2$.

If $X$ and $N$ are assumed to be statistically independent ⇒ signal-independent noise, then $\text{E}\big[X · N \big] = \text{E}\big[X \big] · \text{E}\big[N\big] = 0$ .

Transmission system with additive noise

The received signal is $Y = X + N$. The output PDF $f_Y(y)$ can be calculated with the "convolution operation" ⇒ $f_Y(y) = f_X(x) ∗ f_N(n)$.

For the received power holds:

$$P_Y = \sigma_Y^2 = {\rm E}\big[Y^2\big] = {\rm E}\big[(X+N)^2\big] = {\rm E}\big[X^2\big] + {\rm E}\big[N^2\big] = \sigma_X^2 + \sigma_N^2 $$

$$\Rightarrow \hspace{0.3cm} P_Y = P_X + P_N \hspace{0.05cm}.$$

The sketched probability density functions $($rectangular or trapezoidal$)$ are only intended to clarify the calculation process and have no practical relevance.

To calculate the mutual information between input $X$ and output $Y$ there are three possibilities according to the "graphic in the previous subchapter":

Calculation according to $I(X, Y) = h(X) + h(Y) - h(XY)$:

The first two terms can be calculated in a simple way from $f_X(x)$ and $f_Y(y)$ respectively. The "joint differential entropy" $h(XY)$ is problematic. For this, one needs the two-dimensional joint PDF $f_{XY}(x, y)$, which is usually not given directly.

Calculation according to $I(X, Y) = h(Y) - h(Y|X)$:

Here $h(Y|X)$ denotes the "differential irrelevance". It holds $h(Y|X) = h(X + N|X) = h(N)$, so that $I(X; Y)$ is very easy to calculate via the equation $f_Y(y) = f_X(x) ∗ f_N(n)$ if $f_X(x)$ and $f_N(n)$ are known.

Calculation according to $I(X, Y) = h(X) - h(X|Y)$:

According to this equation, however, one needs the "differential equivocation" $h(X|Y)$, which is more difficult to state than $h(Y|X)$.

$\text{Conclusion:}$ In the following we use the middle equation and write for the »mutual information« between the input $X$ and the output $Y$ of a transmission system in the presence of additive and uncorrelated noise $N$:

$$I(X;Y) \hspace{-0.05cm} = \hspace{-0.01cm} h(Y) \hspace{-0.01cm}- \hspace{-0.01cm}h(N) \hspace{-0.01cm}=\hspace{-0.05cm} -\hspace{-0.7cm} \int\limits_{y \hspace{0.05cm}\in \hspace{0.05cm}{\rm supp}(f_Y)} \hspace{-0.65cm} f_Y(y) \cdot {\rm log} \hspace{0.1cm} \big[f_Y(y)\big] \hspace{0.1cm}{\rm d}y +\hspace{-0.7cm} \int\limits_{n \hspace{0.05cm}\in \hspace{0.05cm}{\rm supp}(f_N)} \hspace{-0.65cm} f_N(n) \cdot {\rm log} \hspace{0.1cm} \big[f_N(n)\big] \hspace{0.1cm}{\rm d}n\hspace{0.05cm}.$$

Channel capacity of the AWGN channel

If one specifies the probability density function of the noise in the previous "general system model" as Gaussian corresponding to

Derivation of the AWGN channel capacity

$$f_N(n) = \frac{1}{\sqrt{2\pi \sigma_N^2}} \cdot {\rm e}^{ - \hspace{0.05cm}{n^2}/(2 \sigma_N^2) } \hspace{0.05cm}, $$

we obtain the model sketched on the right for calculating the channel capacity of the so-called "AWGN channel" ⇒ "Additive White Gaussian Noise"). In the following, we usually replace the variance $\sigma_N^2$ by the power $P_N$.

We know from previous sections:

The "channel capacity" $C_{\rm AWGN}$ specifies the maximum mutual information $I(X; Y)$ between the input quantity $X$ and the output quantity $Y$ of the AWGN channel.

The maximization refers to the best possible input PDF. Thus, under the "power constraint" the following applies:

$$C_{\rm AWGN} = \max_{f_X:\hspace{0.1cm} {\rm E}[X^2 ] \le P_X} \hspace{-0.35cm} I(X;Y) = -h(N) + \max_{f_X:\hspace{0.1cm} {\rm E}[X^2] \le P_X} \hspace{-0.35cm} h(Y) \hspace{0.05cm}.$$

It is already taken into account that the maximization relates solely to the differential entropy $h(Y)$ ⇒ probability density function $f_Y(y)$. Indeed, for a given noise power $P_N$ ⇒ $h(N) = 1/2 · \log_2 (2π{\rm e} · P_N)$ is a constant.

The maximum for $h(Y)$ is obtained for a Gaussian PDF $f_Y(y)$ with $P_Y = P_X + P_N$, see section "Maximum differential entropy under power constraint":

$${\rm max}\big[h(Y)\big] = 1/2 · \log_2 \big[2πe · (P_X + P_N)\big].$$

However, the output PDF $f_Y(y) = f_X(x) ∗ f_N(n)$ is Gaussian only if both $f_X(x)$ and $f_N(n)$ are Gaussian functions. A striking saying about the convolution operation is: Gaussian remains Gaussian, and non-Gaussian never becomes (exactly) Gaussian.

Numerical results for the AWGN channel capacity as a function of ${P_X}/{P_N}$

$\text{Conclusion:}$ For the AWGN channel ⇒ Gaussian noise PDF $f_N(n)$ the channel capacity results exactly when the input PDF $f_X(x)$ is also Gaussian:

$$C_{\rm AWGN} = h_{\rm max}(Y) - h(N) = 1/2 \cdot {\rm log}_2 \hspace{0.1cm} {P_Y}/{P_N}$$

$$\Rightarrow \hspace{0.3cm} C_{\rm AWGN}= 1/2 \cdot {\rm log}_2 \hspace{0.1cm} ( 1 + P_X/P_N) \hspace{0.05cm}.$$

Parallel Gaussian channels

Parallel AWGN channels

We now consider according to the graph $K$ parallel Gaussian channels $X_1 → Y_1$, ... , $X_k → Y_k$, ... , $X_K → Y_K$.

We call the transmission powers in the $K$ channels

$$P_1 = \text{E}[X_1^2], \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ P_k = \text{E}[X_k^2], \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ P_K = \text{E}[X_K^2].$$

The $K$ noise powers can also be different:

$$σ_1^2, \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ σ_k^2, \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ σ_K^2.$$

We are now looking for the maximum mutual information $I(X_1, \hspace{0.15cm}\text{...}\hspace{0.15cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1, \hspace{0.15cm}\text{...}\hspace{0.15cm}, Y_K) $ between

the $K$ input variables $X_1$, ... , $X_K$ and

the $K$ output variables $Y_1$ , ... , $Y_K$,

which we call the »total channel capacity« of this AWGN configuration.

$\text{Agreement:}$

Assume power constraint of the total AWGN system. That is: The sum of all powers $P_k$ in the $K$ individual channels must not exceed the specified value $P_X$ :

$$P_1 + \hspace{0.05cm}\text{...}\hspace{0.05cm}+ P_K = \hspace{0.1cm} \sum_{k= 1}^K \hspace{0.1cm}{\rm E} \left [ X_k^2\right ] \le P_{X} \hspace{0.05cm}.$$

Under the only slightly restrictive assumption of independent noise sources $N_1$, ... , $N_K$ it can be written for the mutual information after some intermediate steps:

$$I(X_1, \hspace{0.05cm}\text{...}\hspace{0.05cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1,\hspace{0.05cm}\text{...}\hspace{0.05cm}, Y_K) = h(Y_1, ... \hspace{0.05cm}, Y_K ) - \hspace{0.1cm} \sum_{k= 1}^K \hspace{0.1cm} h(N_k)\hspace{0.05cm}.$$

The following upper bound can be specified for this:

$$I(X_1,\hspace{0.05cm}\text{...}\hspace{0.05cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1, \hspace{0.05cm}\text{...} \hspace{0.05cm}, Y_K) \hspace{0.2cm} \le \hspace{0.1cm} \hspace{0.1cm} \sum_{k= 1}^K \hspace{0.1cm} \big[h(Y_k) - h(N_k)\big] \hspace{0.2cm} \le \hspace{0.1cm} 1/2 \cdot \sum_{k= 1}^K \hspace{0.1cm} {\rm log}_2 \hspace{0.1cm} ( 1 + {P_k}/{\sigma_k^2}) \hspace{0.05cm}.$$

The equal sign (identity) is valid for mean-free Gaussian input variables $X_k$ as well as for statistically independent disturbances $N_k$.
One arrives from this equation at the "maximum mutual information" ⇒ "channel capacity", if the total transmission power $P_X$ is divided as best as possible, taking into account the different noise powers in the individual channels $(σ_k^2)$.
This optimization problem can again be elegantly solved with the method of "Lagrange multipliers". The following example only explains the result.

Best possible power allocation for $K = 4$ $($"Water–Filling"$)$

$\text{Example 1:}$ We consider $K = 4$ parallel Gaussian channels with four different noise powers $σ_1^2$, ... , $σ_4^2$ according to the adjacent figure (faint green background).

The best possible allocation of the transmission power among the four channels is sought.

If one were to slowly fill this profile with water, the water would initially flow only into $\text{channel 2}$.

If you continue to pour, some water will also accumulate in $\text{channel 1}$ and later also in $\text{channel 4}$.

The drawn "water level" $H$ describes exactly the point in time when the sum $P_1 + P_2 + P_4$ corresponds to the total available transmssion power $P_X$ :

The optimal power allocation for this example results in $P_2 > P_1 > P_4$ as well as $P_3 = 0$.

Only with a larger transmission power $P_X$, a small power $P_3$ would also be allocated to the third channel.

This allocation procedure is called a »Water–Filling algorithm«.

$\text{Example 2:}$ If all $K$ Gaussian channels are equally disturbed ⇒ $σ_1^2 = \hspace{0.15cm}\text{...}\hspace{0.15cm} = σ_K^2 = P_N$, one should naturally allocate the total available transmission power $P_X$ equally to all channels: $P_k = P_X/K$. For the total capacity we then obtain:

Capacity for $K$ parallel channels

$$C_{\rm total} = \frac{ K}{2} \cdot {\rm log}_2 \hspace{0.1cm} ( 1 + \frac{P_X}{K \cdot P_N}) \hspace{0.05cm}.$$

The graph shows the total capacity as a function of $P_X/P_N$ for $K = 1$, $K = 2$ and $K = 3$:

With $P_X/P_N = 10 \ ⇒ \ 10 · \text{lg} (P_X/P_N) = 10 \ \text{dB}$ and $K = 2$, the total capacitance becomes approximately $50\%$ larger if the total power $P_X$ is divided equally between two channels: $P_1 = P_2 = P_X/2$.

In the borderline case $P_X/P_N → ∞$, the total capacity increases by a factor $K$ ⇒ doubling at $K = 2$.

The two identical and independent channels can be realized in different ways, for example by multiplexing in time, frequency or space.

However, the case $K = 2$ can also be realized by using orthogonal basis functions such as "cosine" and "sine" as for example with

"quadrature amplitude modulation" $\rm (QAM)$ or

"multi-level phase modulation" such as $\rm QPSK$ or $\rm 8–PSK$.

Exercises for the chapter

Exercise 4.5: Mutual Information from 2D-PDF

Exercise 4.5Z: Again Mutual Information

Exercise 4.6: AWGN Channel Capacity

Exercise 4.7: Several Parallel Gaussian Channels

Exercise 4.7Z: About the Water Filling Algorithm

@@ Line 7: / Line 7: @@
-==Mutual information between continuous-value random variables ==
+==Mutual information between continuous random variables ==
 <br>
-In the chapter &nbsp;[[Information_Theory/Anwendung_auf_die_Digitalsignalübertragung#Informationstheoretisches_Modell_der_Digitalsignal.C3.BCbertragung|Information-theoretical model of digital signal transmission]]&nbsp; the&nbsp; ''mutual information'' between the two discrete-value random variables&nbsp; $X$&nbsp; and&nbsp; $Y$&nbsp; was given, among other things, in the following form:
+In the chapter &nbsp;[[Information_Theory/Application_to_Digital_Signal_Transmission#Information-theoretical_model_of_digital_signal_transmission|"Information-theoretical model of digital signal transmission"]]&nbsp; the&nbsp; "mutual information"&nbsp; between the two discrete random variables&nbsp; $X$&nbsp; and&nbsp; $Y$&nbsp; was given, among other things,&nbsp; in the following form:
-:$$I(X;Y) = \hspace{-0.4cm} \sum_{(x,\hspace{0.05cm} y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{XY}\hspace{-0.08cm})}
+:$$I(X;Y) = \hspace{0.5cm} \sum_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\sum_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})}
-  \hspace{-0.8cm} P_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{ P_{XY}(x, y)}{P_{X}(x) \cdot P_{Y}(y)} \hspace{0.05cm}.$$
+  \hspace{-0.9cm} P_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{ P_{XY}(x, y)}{P_{X}(x) \cdot P_{Y}(y)} \hspace{0.05cm}.$$
-This equation simultaneously corresponds to the &nbsp;[[Information_Theory/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgrößen#Informational_Divergence_-_Kullback-Leibler_Distance|Kullback&ndash;Leibler distance]]&nbsp; between the joint probability function&nbsp; $P_{XY}$&nbsp;  and the product of the two individual probability functions&nbsp; $P_X$&nbsp; and&nbsp; $P_Y$ :
+This equation simultaneously corresponds to the &nbsp;[[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables#Informational_divergence_-_Kullback-Leibler_distance|"Kullback&ndash;Leibler distance"]]&nbsp; between the joint probability function&nbsp; $P_{XY}$&nbsp;  and the product of the two individual probability functions&nbsp; $P_X$&nbsp; and&nbsp; $P_Y$:
 :$$I(X;Y) = D(P_{XY} \hspace{0.05cm} ||  \hspace{0.05cm}P_{X} \cdot P_{Y}) \hspace{0.05cm}.$$
-In order to derive the mutual information&nbsp; $I(X; Y)$&nbsp; between two continuous-value random variables&nbsp; $X$&nbsp; and&nbsp; $Y$&nbsp;, one proceeds as follows, whereby inverted commas indicate a quantised variable:
+In order to derive the mutual information&nbsp; $I(X; Y)$&nbsp; between two continuous random variables&nbsp; $X$&nbsp; and&nbsp; $Y$,&nbsp; one proceeds as follows,&nbsp; whereby inverted commas indicate a quantized variable:
-*One quantises the random variables&nbsp; $X$&nbsp; and&nbsp; $Y$&nbsp; $($with the quantisation intervals&nbsp; ${\it Δ}x$&nbsp; and&nbsp; ${\it Δ}y)$&nbsp; and thus obtains the probability functions&nbsp; $P_{X\hspace{0.01cm}′}$&nbsp; and&nbsp; $P_{Y\hspace{0.01cm}′}$.
+*One quantizes the random variables&nbsp; $X$&nbsp; and&nbsp; $Y$&nbsp; $($with the quantization intervals&nbsp; ${\it Δ}x$&nbsp; and&nbsp; ${\it Δ}y)$&nbsp; and thus obtains the probability functions&nbsp; $P_{X\hspace{0.01cm}′}$&nbsp; and&nbsp; $P_{Y\hspace{0.01cm}′}$.
-*The „vectors”&nbsp; $P_{X\hspace{0.01cm}′}$&nbsp; and&nbsp; $P_{Y\hspace{0.01cm}′}$&nbsp; become infinitely long after the boundary transitions&nbsp; ${\it Δ}x → 0,&nbsp; {\it Δ}y → 0$&nbsp;, and the joint PMF&nbsp; $P_{X\hspace{0.01cm}′\hspace{0.08cm}Y\hspace{0.01cm}′}$&nbsp; is then also infinitely extended in area.
+*The "vectors"&nbsp; $P_{X\hspace{0.01cm}′}$&nbsp; and&nbsp; $P_{Y\hspace{0.01cm}′}$&nbsp; become infinitely long after the boundary transitions&nbsp; ${\it Δ}x → 0,\hspace{0.1cm} {\it Δ}y → 0$,&nbsp; and the joint PMF&nbsp; $P_{X\hspace{0.01cm}′\hspace{0.08cm}Y\hspace{0.01cm}′}$&nbsp; is also infinitely extended in area.
 *These boundary transitions give rise to the probability density functions of the continuous random variables according to the following equations:
@@ Line 27: / Line 29: @@
 \hspace{0.3cm}f_{XY}(x_{\mu}\hspace{0.05cm}, y_{\mu}) = \frac{P_{X\hspace{0.01cm}'\hspace{0.03cm}Y\hspace{0.01cm}'}(x_{\mu}\hspace{0.05cm}, y_{\mu})} {{\it \Delta_x} \cdot {\it \Delta_y}} \hspace{0.05cm}.$$
-*The double sum in the above equation, after renaming&nbsp; $Δx → {\rm d}x$&nbsp; or&nbsp; $Δy → {\rm d}y$&nbsp;, becomes the equation valid for continuous value random variables:
+*The double sum in the above equation, after renaming&nbsp; $Δx → {\rm d}x$&nbsp; and&nbsp; $Δy → {\rm d}y$,&nbsp; becomes the equation valid for continuous value random variables:
-:$$I(X;Y) = \hspace{0.2cm} \int \hspace{-0.9cm} \int\limits_{\hspace{-0.4cm}(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm} (\hspace{-0.03cm}f_{XY}\hspace{-0.08cm})}
+:$$I(X;Y)  = \hspace{0.5cm} \int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})}
-  \hspace{-0.6cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{ f_{XY}(x, y) }
+  \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{ f_{XY}(x, y) }
 {f_{X}(x) \cdot f_{Y}(y)}
   \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y \hspace{0.05cm}.$$
 {{BlaueBox|TEXT=
-$\text{Conclusion:}$&nbsp; By splitting this double integral, it is also possible to write for the transinformation:
+$\text{Conclusion:}$&nbsp; By splitting this double integral,&nbsp; it is also possible to write for the&nbsp; &raquo;'''mutual information'''&laquo;:
 :$$I(X;Y) = h(X) + h(Y) - h(XY)\hspace{0.05cm}.$$
-The ''joint differential entropy''
+The&nbsp; &raquo;'''joint differential entropy'''&laquo;
-:$$h(XY) = -\hspace{0.2cm} \int \hspace{-0.9cm} \int\limits_{\hspace{-0.4cm}(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm} (\hspace{-0.03cm}f_{XY}\hspace{-0.08cm})}
+:$$h(XY)   = - \hspace{-0.3cm}\int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})}
-  \hspace{-0.6cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \big[f_{XY}(x, y) \big]
+  \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \hspace{0.1cm} \big[f_{XY}(x, y) \big]
   \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y$$
-and the two ''differential single entropies''
+and the two&nbsp; &raquo;'''differential single entropies'''&laquo;
 :$$h(X) = -\hspace{-0.7cm}  \int\limits_{x \hspace{0.05cm}\in \hspace{0.05cm}{\rm supp}\hspace{0.03cm} (\hspace{-0.03cm}f_X)} \hspace{-0.35cm}  f_X(x) \cdot {\rm log} \hspace{0.1cm} \big[f_X(x)\big] \hspace{0.1cm}{\rm d}x
@@ Line 54: / Line 56: @@
 ==On equivocation and irrelevance==
 <br>
-We further assume the continuous value mutual information&nbsp;$I(X;Y) = h(X) + h(Y) - h(XY)$&nbsp; .&nbsp; This representation is also found in the following diagram (left graph).
+We further assume the continuous  mutual information&nbsp; $I(X;Y) = h(X) + h(Y) - h(XY)$. &nbsp;  This representation is also found in the following diagram&nbsp; $($left graph$)$.
-[[File:P_ID2882__Inf_T_4_2_S2neu.png|right|frame|Representation of the mutual information for continuous value random variables]]
+[[File:EN_Inf_T_4_2_S2.png|right|frame|Representation of the mutual information for continuous-valued random variables]]
 From this you can see that the mutual information can also be represented as follows:
@@ Line 64: / Line 66: @@
 These fundamental information-theoretical relationships can also be read from the graph on the right.&nbsp;
-This directional representation is particularly suitable for message transmission systems.
+&rArr; &nbsp; This directional representation is particularly suitable for communication systems.&nbsp; The outflowing or inflowing differential entropy characterises
+*the&nbsp; &raquo;'''equivocation'''&laquo;:
-The outflowing or inflowing differential entropy characterises
-*the&nbsp; '''equivocation''':
-:$$h(X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y) =\hspace{0.2cm} -\int \hspace{-0.9cm} \int\limits_{\hspace{-0.4cm}(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.03cm} (\hspace{-0.03cm}f_{XY}\hspace{-0.08cm})}
+:$$h(X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y)   = - \hspace{-0.3cm}\int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})}
-  \hspace{-0.6cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \big [{f_{\hspace{0.03cm}X \mid \hspace{0.03cm} Y} (x \hspace{-0.05cm}\mid \hspace{-0.05cm} y)} \big]
+  \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \hspace{0.1cm} \big [{f_{\hspace{0.03cm}X \mid \hspace{0.03cm} Y} (x \hspace{-0.05cm}\mid \hspace{-0.05cm} y)} \big]
-  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y\hspace{0.05cm},$$
+  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y,$$
-*the&nbsp; '''irrelevance''':
+*the&nbsp; &raquo;'''irrelevance'''&laquo;:
-:$$h(Y \hspace{-0.05cm}\mid \hspace{-0.05cm} X) =\hspace{0.2cm}- \int \hspace{-0.9cm} \int\limits_{\hspace{-0.4cm}(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.03cm} (\hspace{-0.03cm}f_{XY}\hspace{-0.08cm})}
+:$$h(Y \hspace{-0.05cm}\mid \hspace{-0.05cm} X)   = - \hspace{-0.3cm}\int\limits_{\hspace{-0.9cm}y \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{Y}\hspace{-0.08cm})} \hspace{-1.1cm}\int\limits_{\hspace{1.3cm} x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.05cm} (P_{X}\hspace{-0.08cm})}
-  \hspace{-0.6cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \big [{f_{\hspace{0.03cm}Y \mid \hspace{0.03cm} X} (y \hspace{-0.05cm}\mid \hspace{-0.05cm} x)} \big]
+  \hspace{-0.9cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \hspace{0.1cm} \big [{f_{\hspace{0.03cm}Y \mid \hspace{0.03cm} X} (y \hspace{-0.05cm}\mid \hspace{-0.05cm} x)} \big]
-  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y\hspace{0.05cm}.$$
+  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y.$$
-The significance of these two information-theoretic quantities will be discussed in more detail in&nbsp; [[Aufgaben:4.5Z_Nochmals_Transinformation|task 4.5Z]]&nbsp;.
+The significance of these two information-theoretic quantities will be discussed in more detail in&nbsp; [[Aufgaben:Exercise_4.5Z:_Again_Mutual_Information|$\text{Exercise 4.5Z}$]]&nbsp;.
 If one compares the graphical representations of the mutual information for
-*discrete value random variables in the section &nbsp;[[Information_Theory/Anwendung_auf_die_Digitalsignalübertragung#Information-theoretical_model_of_digital_signal_transmission|Information-theoretical model of digital signal transmission]&nbsp; continuous value random variables according to the above diagram,
+*discrete  random variables in the section &nbsp;[[Information_Theory/Application_to_Digital_Signal_Transmission#Information-theoretical_model_of_digital_signal_transmission|"Information-theoretical model of digital signal transmission"]],&nbsp; and
+*continuous  random variables according to the above diagram,
-the only distinguishing feature is that each „capital $H$”&nbsp; (entropy; larger-equal zero)&nbsp; has been replaced by a „non-capital $h$”&nbsp; (differential entropy can be positive, negative or zero)&nbsp;.
-*Otherwise, the mutual information is the same in both representations and&nbsp;$I(X; Y) ≥ 0$ always applies.
+the only distinguishing feature is that each&nbsp; $($capital$)$&nbsp; $H$&nbsp; $($entropy;&nbsp; $\ge 0)$&nbsp; has been replaced by a&nbsp; $($non-capital$)$ $h$&nbsp; $($differential entropy;&nbsp; can be positive,&nbsp; negative or zero$)$.
-*In the following, we mostly use the&nbsp; ''binary logarithm''  &nbsp; ⇒  &nbsp;  $\log_2$&nbsp; and thus obtain the mutual information in „bit”.
+*Otherwise,&nbsp; the mutual information is the same in both representations and&nbsp; $I(X; Y) ≥ 0$&nbsp; always applies.
+*In the following,&nbsp; we mostly use the&nbsp; "binary logarithm"  &nbsp; ⇒  &nbsp;  $\log_2$&nbsp; and thus obtain the mutual information with the pseudo-unit&nbsp; "bit".
@@ Line 93: / Line 97: @@
 <br>
 We now consider a very simple model of message transmission:
-*The random variable&nbsp; $X$&nbsp; stands for the (zero mean) transmission signal and is characterised by the PDF&nbsp; $f_X(x)$&nbsp; and the variance&nbsp; $σ_X^2$&nbsp; .&nbsp; The transmission power is $P_X = σ_X^2$.
+*The random variable&nbsp; $X$&nbsp; stands for the&nbsp; $($zero mean$)$&nbsp; transmitted signal and is characterized by PDF&nbsp; $f_X(x)$&nbsp; and variance&nbsp; $σ_X^2$.&nbsp;  Transmission power:&nbsp; $P_X = σ_X^2$.
-*The additive noise&nbsp; $N$&nbsp; is given by the PDF&nbsp; $f_N(n)$&nbsp; and the noise power&nbsp; $P_N = σ_N^2$&nbsp;.
+*The additive noise&nbsp; $N$&nbsp; is given by the&nbsp; $($mean-free$)$&nbsp;  PDF&nbsp; $f_N(n)$&nbsp; and the noise power&nbsp; $P_N = σ_N^2$.
 *If&nbsp; $X$&nbsp; and&nbsp; $N$&nbsp; are assumed to be statistically independent &nbsp; &rArr; &nbsp;  signal-independent noise, then&nbsp; $\text{E}\big[X · N \big] = \text{E}\big[X \big] · \text{E}\big[N\big] = 0$ .
-*The received signal is &nbsp;$Y = X + N$.&nbsp; The output PDF&nbsp; $f_Y(y)$&nbsp; can be calculated with the [[Signal_Representation/The_Convolution_Theorem_and_Operation#Convolution_in_Time_Domain|convolution operation]]&nbsp; &nbsp; ⇒ &nbsp;  $f_Y(y) = f_X(x) ∗ f_N(n)$.
+[[File:Inf_T_4_2_S3neu.png|right|frame|Transmission system with additive noise]]
+*The received signal is &nbsp;$Y = X + N$.&nbsp; The output PDF&nbsp; $f_Y(y)$&nbsp; can be calculated with the [[Signal_Representation/The_Convolution_Theorem_and_Operation#Convolution_in_the_time_domain|"convolution operation"]]&nbsp; &nbsp; ⇒ &nbsp;  $f_Y(y) = f_X(x) ∗ f_N(n)$.
-[[File:Inf_T_4_2_S3neu.png|right|frame|Message transmission system with additive noise]]
+* For the received power holds:
-* For the received power (variance) holds:
 :$$P_Y = \sigma_Y^2 = {\rm E}\big[Y^2\big] = {\rm E}\big[(X+N)^2\big] =  {\rm E}\big[X^2\big] +  {\rm E}\big[N^2\big] = \sigma_X^2 + \sigma_N^2 $$
@@ Line 105: / Line 112: @@
 \hspace{0.05cm}.$$
-The sketched density functions sketched (rectangular or trapezoidal) are only intended to clarify the calculation process and have no practical relevance.
+The sketched probability density functions&nbsp; $($rectangular or trapezoidal$)$&nbsp; are only intended to clarify the calculation process and have no practical relevance.
-<br clear=all>
-To calculate the mutual information between input&nbsp; $X$&nbsp; and output&nbsp; $Y$&nbsp; there are three possibilities according to the&nbsp; [[Information_Theory/AWGN–Kanalkapazität_bei_wertkontinuierlichem_Eingang#On_equivocation_and_irrelevance|graphic on the previous subchapter]]&nbsp; drei Möglichkeiten:
+To calculate the mutual information between input&nbsp; $X$&nbsp; and output&nbsp; $Y$&nbsp; there are three possibilities according to the&nbsp; [[Information_Theory/AWGN–Kanalkapazität_bei_wertkontinuierlichem_Eingang#On_equivocation_and_irrelevance|"graphic in the previous subchapter"]]:
 * Calculation according to &nbsp;$I(X, Y) = h(X) + h(Y) - h(XY)$:
-:The first two terms can be calculated in a simple way from &nbsp;$f_X(x)$&nbsp; and &nbsp;$f_Y(y)$&nbsp; respectively.&nbsp; The&nbsp; ''joint differentrial entropy'' &nbsp;$h(XY)$ is problematic.&nbsp; For this, one needs the 2D joint PDF &nbsp;$f_{XY}(x, y)$, which is usually not given directly.
+::The first two terms can be calculated in a simple way from &nbsp;$f_X(x)$&nbsp; and &nbsp;$f_Y(y)$&nbsp; respectively.&nbsp; The&nbsp; "joint differential entropy" &nbsp;$h(XY)$ is problematic.&nbsp; For this,&nbsp; one needs the two-dimensional joint PDF &nbsp;$f_{XY}(x, y)$,&nbsp; which is usually not given directly.
 * Calculation according to &nbsp;$I(X, Y) = h(Y) - h(Y|X)$:
-:Here &nbsp;$h(Y|X)$&nbsp; denotes the&nbsp; ''differential scattering entropy''.&nbsp; It holds that &nbsp;$h(Y|X) = h(X + N|X) = h(N)$, so that &nbsp;$I(X; Y)$&nbsp; is very easy to calculate via the equation &nbsp;$f_Y(y) = f_X(x) ∗ f_N(n)$&nbsp; if $f_X(x)$&nbsp; and $f_N(n)$&nbsp; are known.
+::Here &nbsp;$h(Y|X)$&nbsp; denotes the&nbsp; "differential  irrelevance".&nbsp; It holds &nbsp;$h(Y|X) = h(X + N|X) = h(N)$,&nbsp; so that &nbsp;$I(X; Y)$&nbsp; is very easy to calculate via the equation &nbsp;$f_Y(y) = f_X(x) ∗ f_N(n)$&nbsp; if $f_X(x)$&nbsp; and $f_N(n)$&nbsp; are known.
 * Calculation according to &nbsp;$I(X, Y) = h(X) - h(X|Y)$:
-:According to this equation, however, one needs the differential inference entropy&nbsp;$h(X|Y)$, which is more difficult to state than&nbsp;$h(Y|X)$.
+::According to this equation,&nbsp; however,&nbsp; one needs the &nbsp; "differential equivocation" &nbsp; $h(X|Y)$,&nbsp; which is more difficult to state than&nbsp; $h(Y|X)$.
 {{BlaueBox|TEXT=
-$\text{Conclusion:}$&nbsp; In the following we use the middle equation and write for the mutual information between the input&nbsp; $X$&nbsp; and the output&nbsp; $Y$&nbsp; of a&nbsp; ''message transmission system in the presence of additive and uncorrelated noise''&nbsp; $N$:
+$\text{Conclusion:}$&nbsp; In the following we use the middle equation and write for the&nbsp; &raquo;'''mutual information'''&laquo;&nbsp; between the input&nbsp; $X$&nbsp; and the output&nbsp; $Y$&nbsp; of a&nbsp; transmission system in the presence of additive and uncorrelated noise&nbsp; $N$:
 :$$I(X;Y) \hspace{-0.05cm} = \hspace{-0.01cm} h(Y) \hspace{-0.01cm}- \hspace{-0.01cm}h(N) \hspace{-0.01cm}=\hspace{-0.05cm}
@@ Line 126: / Line 134: @@
 ==Channel capacity of the AWGN channel==
 <br>
-If one specifies the probability density function of the noise in the previous&nbsp;   [[Information_Theory/AWGN–Kanalkapazität_bei_wertkontinuierlichem_Eingang#Calculation_of_mutual_information_with_additive_noise|general system model]]&nbsp; as Gaussian corresponding to
+If one specifies the probability density function of the noise in the previous&nbsp;   [[Information_Theory/AWGN–Kanalkapazität_bei_wertkontinuierlichem_Eingang#Calculation_of_mutual_information_with_additive_noise|"general system model"]]&nbsp; as Gaussian corresponding to
 [[File:P_ID2884__Inf_T_4_2_S4_neu.png|right|frame|Derivation of the AWGN channel capacity]]
 :$$f_N(n) = \frac{1}{\sqrt{2\pi  \sigma_N^2}} \cdot {\rm e}^{
 - \hspace{0.05cm}{n^2}/(2 \sigma_N^2) } \hspace{0.05cm}, $$
-we obtain the model sketched on the right for calculating the channel capacity of the so-called&nbsp; [[Modulation_Methods/Qualitätskriterien#Einige_Anmerkungen_zum_AWGN.E2.80.93Kanalmodell|AWGN channel]]&nbsp; (''Additive White Gaussian Noise'').&nbsp; In the following, we usually replace&nbsp; $\sigma_N^2$&nbsp; by&nbsp; $P_N$.
+we obtain the model sketched on the right for calculating the channel capacity of the so-called&nbsp; [[Modulation_Methods/Quality_Criteria#Some_remarks_on_the_AWGN_channel_model|"AWGN channel"]] &nbsp; &rArr; &nbsp;   "Additive White Gaussian Noise").&nbsp; In the following,&nbsp; we usually replace the variance&nbsp; $\sigma_N^2$&nbsp; by the power&nbsp; $P_N$.
-<br clear=all>
 We know from previous sections:
-*The&nbsp; [[Information_Theory/Anwendung_auf_die_Digitalsignalübertragung#Definition_and_meaning_of_channel_capacity|channel capacity]]&nbsp; $C_{\rm AWGN}$&nbsp; specifies the maximum mutual information&nbsp; $I(X; Y)$&nbsp; between the input quantity&nbsp;  $X$&nbsp;  and the output quantity&nbsp;  $Y$&nbsp;  of the AWGN channel.&nbsp;  The maximisation refers to the best possible input PDF.&nbsp;  Thus, under the&nbsp;  [[Information_Theory/Differentielle_Entropie#Differential_entropy_of_some_power-constrained_random_variables|power constraint]] the following applies:
+*The&nbsp; [[Information_Theory/Anwendung_auf_die_Digitalsignalübertragung#Definition_and_meaning_of_channel_capacity|"channel capacity"]]&nbsp; $C_{\rm AWGN}$&nbsp; specifies the maximum mutual information&nbsp; $I(X; Y)$&nbsp; between the input quantity&nbsp;  $X$&nbsp;  and the output quantity&nbsp;  $Y$&nbsp;  of the AWGN channel.&nbsp;
+*The maximization refers to the best possible input PDF.&nbsp;  Thus,&nbsp; under the&nbsp;  [[Information_Theory/Differentielle_Entropie#Differential_entropy_of_some_power-constrained_random_variables|"power constraint"]]&nbsp; the following applies:
 :$$C_{\rm AWGN} = \max_{f_X:\hspace{0.1cm} {\rm E}[X^2 ] \le P_X} \hspace{-0.35cm}  I(X;Y)
 = -h(N) + \max_{f_X:\hspace{0.1cm} {\rm E}[X^2] \le P_X} \hspace{-0.35cm}  h(Y)
 \hspace{0.05cm}.$$
+*It is already taken into account that the maximization relates solely to the differential entropy &nbsp;$h(Y)$ &nbsp; ⇒ &nbsp; probability density function &nbsp;$f_Y(y)$.&nbsp;  Indeed, for a given noise power&nbsp;  $P_N$ &nbsp; &rArr;  &nbsp; $h(N) = 1/2 · \log_2 (2π{\rm e} · P_N)$&nbsp; is a constant.
-:It is already taken into account that the maximisation relates solely to the differential entropy &nbsp;$h(Y)$ &nbsp; ⇒ &nbsp; PDF &nbsp;$f_Y(y)$&nbsp; bezieht.&nbsp;  Indeed, for a given noise power&nbsp;  $P_N$&nbsp;, &nbsp;$h(N) = 1/2 · \log_2 (2π{\rm e} · P_N)$&nbsp; is a constant.
+*The maximum for &nbsp;$h(Y)$&nbsp;  is obtained for a Gaussian PDF &nbsp;$f_Y(y)$&nbsp; with &nbsp;$P_Y = P_X + P_N$,&nbsp; see section&nbsp; [[Information_Theory/Differentielle_Entropie#Proof:_Maximum_differential_entropy_with_power_constraint|"Maximum differential entropy under power constraint"]]:
-*The maximum for &nbsp;$h(Y)$&nbsp;  is obtained for a Gaussian PDF &nbsp;$f_Y(y)$&nbsp; with &nbsp;$P_Y = P_X + P_N$&nbsp;t, see page&nbsp; [[Information_Theory/Differentielle_Entropie#Proof:_Maximum_differential_entropy_with_power_constraint|maximum differential entropy under power constraint]]:
 :$${\rm max}\big[h(Y)\big] = 1/2 · \log_2 \big[2πe · (P_X + P_N)\big].$$
-*However, the output PDF &nbsp;$f_Y(y) = f_X(x) ∗ f_N(n)$&nbsp; is Gaussian only if both&nbsp;  $f_X(x)$&nbsp;  and&nbsp;  $f_N(n)$&nbsp;  are Gaussian functions.&nbsp; A striking saying about the convolution operation is:&nbsp; '''Gaussian remains Gaussian, and non-Gaussian never becomes (exactly) Gaussian'''.
+*However,&nbsp; the output PDF &nbsp;$f_Y(y) = f_X(x) ∗ f_N(n)$&nbsp; is Gaussian only if both&nbsp;  $f_X(x)$&nbsp;  and&nbsp;  $f_N(n)$&nbsp;  are Gaussian functions. &nbsp; A striking saying about the convolution operation is:&nbsp; '''Gaussian remains Gaussian, and non-Gaussian never becomes (exactly) Gaussian'''.
 {{BlaueBox|TEXT=
-$\text{Conclusion:}$&nbsp; For the AWGN channel &nbsp; ⇒ &nbsp;Gaussian noise PDF &nbsp;$f_N(n)$&nbsp; the&nbsp; ''channel capacity''&nbsp; results exactly when the input PDF &nbsp;$f_X(x)$&nbsp; is ''also Gaussian'':
+[[File:P_ID2885__Inf_T_4_2_S4b_neu.png|right|frame|Numerical results for the AWGN channel capacity as a function of&nbsp; ${P_X}/{P_N}$]]
+$\text{Conclusion:}$&nbsp; For the AWGN channel &nbsp; ⇒ &nbsp;Gaussian noise PDF &nbsp;$f_N(n)$&nbsp; the&nbsp; channel capacity&nbsp; results exactly when the input PDF &nbsp;$f_X(x)$&nbsp; is also Gaussian:
-[[File:P_ID2885__Inf_T_4_2_S4b_neu.png|right|frame|Numerical results for the AWGN channel capacity as a function of&nbsp; ${P_X}/{P_N}$]]
 :$$C_{\rm AWGN} = h_{\rm max}(Y) - h(N) = 1/2 \cdot  {\rm log}_2 \hspace{0.1cm} {P_Y}/{P_N}$$
 :$$\Rightarrow \hspace{0.3cm} C_{\rm AWGN}=  1/2 \cdot  {\rm log}_2 \hspace{0.1cm} ( 1 + P_X/P_N) \hspace{0.05cm}.$$}}
@@ Line 157: / Line 168: @@
 ==Parallel Gaussian channels ==
 <br>
-[[File:P_ID2891__Inf_T_4_2_S4c_neu.png|frame|Parallel AWGN channels]]
+[[File:EN_Inf_T_4_2_S4c.png|frame|Parallel AWGN channels]]
+We now consider   according to the graph&nbsp; $K$&nbsp; parallel Gaussian channels&nbsp; $X_1 → Y_1$,&nbsp; ... ,&nbsp;  $X_k → Y_k$,&nbsp; ... , $X_K → Y_K$.
-We now consider, according to the graph&nbsp; $K$&nbsp; parallel Gaussian channels of&nbsp; $X_1 → Y_1$,&nbsp; ... ,&nbsp;  $X_k → Y_k$,&nbsp; ... , $X_K → Y_K$.
 *We call the transmission powers in the&nbsp; $K$&nbsp; channels
 :$$P_1 = \text{E}[X_1^2], \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ P_k = \text{E}[X_k^2], \hspace{0.15cm}\text{...}\hspace{0.15cm}  ,\ P_K = \text{E}[X_K^2].$$
 *The&nbsp; $K$&nbsp; noise powers can also be different:
 :$$σ_1^2, \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ σ_k^2, \hspace{0.15cm}\text{...}\hspace{0.15cm} ,\ σ_K^2.$$
 We are now looking for the maximum mutual information  &nbsp;$I(X_1, \hspace{0.15cm}\text{...}\hspace{0.15cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1, \hspace{0.15cm}\text{...}\hspace{0.15cm}, Y_K) $&nbsp; between
 *the&nbsp; $K$&nbsp; input variables&nbsp; $X_1$,&nbsp; ... , $X_K$&nbsp; and
 *the&nbsp; $K$ output variables&nbsp; $Y_1$&nbsp;, ... , $Y_K$,
-which we call the&nbsp; ''total channel capacity''&nbsp; of this AWGN configuration.
+which we call the&nbsp; &raquo;'''total channel capacity'''&laquo;&nbsp; of this AWGN configuration.
+<br clear=all>
 {{BlaueBox|TEXT=
 $\text{Agreement:}$&nbsp;
-AAssume power constraint of the total system.&nbsp; That is: &nbsp; <br>&nbsp; &nbsp; The sum of all powers&nbsp; $P_k$&nbsp; in the&nbsp; $K$&nbsp; individual channels must not exceed the specified value&nbsp; $P_X$&nbsp;:
+Assume power constraint of the total AWGN system.&nbsp; That is: &nbsp;  The sum of all powers&nbsp; $P_k$&nbsp; in the&nbsp; $K$&nbsp; individual channels must not exceed the specified value&nbsp; $P_X$&nbsp;:
 :$$P_1 + \hspace{0.05cm}\text{...}\hspace{0.05cm}+ P_K = \hspace{0.1cm} \sum_{k= 1}^K
@@ Line 181: / Line 193: @@
-Under the only slightly restrictive assumption of independent noise sources&nbsp; $N_1$,&nbsp; ... ,&nbsp; $N_K$&nbsp; can be written for the mutual information after some intermediate steps:
+Under the only slightly restrictive assumption of independent noise sources&nbsp; $N_1$,&nbsp; ... ,&nbsp; $N_K$&nbsp; it can be written for the mutual information after some intermediate steps:
 :$$I(X_1, \hspace{0.05cm}\text{...}\hspace{0.05cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1,\hspace{0.05cm}\text{...}\hspace{0.05cm}, Y_K) = h(Y_1, ... \hspace{0.05cm}, Y_K ) - \hspace{0.1cm} \sum_{k= 1}^K
   \hspace{0.1cm} h(N_k)\hspace{0.05cm}.$$
-The following upper bound can be specified for this:
+*The following upper bound can be specified for this:
 :$$I(X_1,\hspace{0.05cm}\text{...}\hspace{0.05cm}, X_K\hspace{0.05cm};\hspace{0.05cm}Y_1, \hspace{0.05cm}\text{...} \hspace{0.05cm}, Y_K)
-\hspace{0.2cm} \le \hspace{0.1cm} \hspace{0.1cm} \sum_{k= 1}^K  \hspace{0.1cm} \big[h(Y_k - h(N_k)\big]
+\hspace{0.2cm} \le \hspace{0.1cm} \hspace{0.1cm} \sum_{k= 1}^K  \hspace{0.1cm} \big[h(Y_k) - h(N_k)\big]
 \hspace{0.2cm} \le \hspace{0.1cm} 1/2 \cdot \sum_{k= 1}^K  \hspace{0.1cm} {\rm log}_2 \hspace{0.1cm} ( 1 + {P_k}/{\sigma_k^2})
 \hspace{0.05cm}.$$
-*The equal sign (identity) is valid for mean-free Gaussian input variables&nbsp; $X_k$&nbsp; as well as for statistically independent disturbances&nbsp; $N_k$.
+#The equal sign&nbsp; (identity)&nbsp; is valid for mean-free Gaussian input variables&nbsp; $X_k$&nbsp; as well as for statistically independent disturbances&nbsp; $N_k$.
-*One arrives from this equation at the&nbsp; ''maximum mutual information'' &nbsp;  ⇒ &nbsp;  ''channel capacity'',  if the total transmission power&nbsp; $P_X$&nbsp; is divided as best as possible, taking into account the different interferences in the individual channels &nbsp;$(σ_k^2)$&nbsp;.
+#One arrives from this equation at the&nbsp; "maximum mutual information" &nbsp;  ⇒ &nbsp;  "channel capacity",&nbsp;  if the total transmission power&nbsp; $P_X$&nbsp; is divided as best as possible,&nbsp; taking into account the different noise powers in the individual channels &nbsp;$(σ_k^2)$.
-*This optimisation problem can again be elegantly solved with the method of&nbsp; [https://en.wikipedia.org/wiki/Lagrange_multiplier Lagrange multipliers]&nbsp; elegant lösen.&nbsp; The following example only explains the result.
+#This optimization problem can again be elegantly solved with the method of&nbsp; [https://en.wikipedia.org/wiki/Lagrange_multiplier "Lagrange multipliers"].&nbsp;  The following example only explains the result.
-[[File:P_ID2894__Inf_T_4_2_S4d.png|right|frame|Best possible power division for&nbsp; $K = 4$&nbsp; („Water–Filling”)]]
 {{GraueBox|TEXT=
-$\text{Beispiel 1:}$&nbsp; We consider&nbsp; $K = 4$&nbsp; parallel Gaussian channels with four different noise powers&nbsp; $σ_1^2$,&nbsp; ... ,&nbsp; $σ_4^2$&nbsp; according to the adjacent figure (faint green background).
+[[File:EN_Inf_T_4_2_S4d_v2.png|right|frame|Best possible power allocation for&nbsp; $K = 4$&nbsp; $($"Water–Filling"$)$]]
-*The best possible distribution of the transmitting power among the four channels is sought.
+$\text{Example 1:}$&nbsp; We consider&nbsp; $K = 4$&nbsp; parallel Gaussian channels with four different noise powers&nbsp; $σ_1^2$,&nbsp; ... ,&nbsp; $σ_4^2$&nbsp; according to the adjacent figure (faint green background).
-*If one were to slowly fill this profile with water, the water would initially flow only into&nbsp; $\text{channel 2}$&nbsp; .
+*The best possible allocation of the transmission power among the four channels is sought.
-*If you continue to pour, some water will also accumulate in&nbsp; $\text{channel 1}$&nbsp; and later also in&nbsp; $\text{channel 4}$.
+*If one were to slowly fill this profile with water,&nbsp; the water would initially flow only into&nbsp; $\text{channel 2}$.
+*If you continue to pour,&nbsp; some water will also accumulate in&nbsp; $\text{channel 1}$&nbsp; and later also in&nbsp; $\text{channel 4}$.
+The drawn&nbsp; "water level"&nbsp; $H$&nbsp; describes exactly the point in time when the sum &nbsp;$P_1 + P_2 + P_4$&nbsp; corresponds to the total available transmssion power&nbsp; $P_X$&nbsp; :
+*The optimal power allocation for this example results in &nbsp;$P_2 > P_1 > P_4$&nbsp; as well as &nbsp;$P_3 = 0$.
-The drawn „water level”&nbsp; $H$&nbsp; describes exactly the point in time when the sum &nbsp;$P_1 + P_2 + P_4$&nbsp; corresponds to the total available transmitting power&nbsp; $P_X$&nbsp; :
+*Only with a larger transmission power&nbsp; $P_X$,&nbsp; a small power&nbsp; $P_3$&nbsp; would also be allocated to the third channel.
-*The optimal power distribution for this example results in &nbsp;$P_2 > P_1 > P_4$&nbsp; as well as &nbsp;$P_3 = 0$.
-*Only with a larger transmitting power&nbsp; $P_X$&nbsp; would a small power&nbsp; $P_3$&nbsp; also be allocated to the third channel.
-This allocation procedure is called a '''Water–Filling algorithm'''.}}
+This allocation procedure is called a&nbsp; &raquo;'''Water–Filling algorithm'''&laquo;.}}
 {{GraueBox|TEXT=
 $\text{Example 2:}$&nbsp;
-If all&nbsp; $K$&nbsp; Gaussian channels are equally disturbed &nbsp; ⇒ &nbsp; $σ_1^2 = \hspace{0.15cm}\text{...}\hspace{0.15cm} = σ_K^2 = P_N$,one should naturally distribute the total available transmit power&nbsp; $P_X$&nbsp; equally to all channels:: &nbsp; $P_k = P_X/K$.&nbsp; For the total capacity one then obtains:
+If all&nbsp; $K$&nbsp; Gaussian channels are equally disturbed &nbsp; ⇒ &nbsp; $σ_1^2 = \hspace{0.15cm}\text{...}\hspace{0.15cm} = σ_K^2 = P_N$,&nbsp; one should naturally allocate the total available transmission power&nbsp; $P_X$&nbsp; equally to all channels: &nbsp; $P_k = P_X/K$.&nbsp; For the total capacity we then obtain:
-[[File:P_ID2939__Inf_T_4_2_S5_neu.png|right|frame|Capacity for&nbsp; $K$&nbsp; parallel channels]]
+[[File:EN_Inf_Z_4_1.png|right|frame|Capacity for&nbsp; $K$&nbsp; parallel channels]]
-:$$C_{\rm Gesamt}
+:$$C_{\rm total}
 = \frac{ K}{2} \cdot  {\rm log}_2 \hspace{0.1cm} ( 1 + \frac{P_X}{K \cdot P_N})
 \hspace{0.05cm}.$$
 The graph shows the total capacity as a function of&nbsp; $P_X/P_N$&nbsp; for&nbsp; $K = 1$,&nbsp; $K = 2$&nbsp; and&nbsp; $K = 3$:
-*For &nbsp;$P_X/P_N = 10  \ ⇒ \  10 · \text{lg} (P_X/P_N) = 10 \ \text{dB}$&nbsp;, the total capacitance becomes approximately&nbsp; $50\%$&nbsp; larger if the total power&nbsp; $P_X$&nbsp; is divided equally between two channels: &nbsp; $P_1 = P_2 = P_X/2$.
+*With &nbsp;$P_X/P_N = 10  \ ⇒ \  10 · \text{lg} (P_X/P_N) = 10 \ \text{dB}$&nbsp; and &nbsp; $K = 2$,&nbsp; the total capacitance becomes approximately&nbsp; $50\%$&nbsp; larger if the total power&nbsp; $P_X$&nbsp; is divided equally between two channels: &nbsp; $P_1 = P_2 = P_X/2$.
-*In the borderline case &nbsp;$P_X/P_N → ∞$&nbsp;, the total capacity increases by a factor&nbsp; $K$&nbsp;  &nbsp; ⇒  &nbsp; doubling at $K = 2$.
+*In the borderline case &nbsp;$P_X/P_N → ∞$,&nbsp; the total capacity increases by a factor&nbsp; $K$  &nbsp; ⇒  &nbsp; doubling at&nbsp; $K = 2$.
-The two identical and independent channels can be realised in different ways, for example by multiplexing in time, frequency or space.
-However, the case&nbsp; $K = 2$&nbsp; can also be realised by using orthogonal basis functions such as „cosine” und „sine” as for example with
+The two identical and independent channels can be realized in different ways,&nbsp; for example by multiplexing in time,&nbsp; frequency or space.
-*&nbsp; [[Modulation_Methods/Quadratur–Amplitudenmodulation|quadrature amplitude modulation]]&nbsp; (QAM) oder
+However,&nbsp; the case&nbsp; $K = 2$&nbsp; can also be realized by using orthogonal basis functions such as&nbsp; "cosine"&nbsp; and&nbsp; "sine"&nbsp; as for example with
-*einer&nbsp; [[Modulation_Methods/Quadratur–Amplitudenmodulation#Weitere_Signalraumkonstellationen|multi-level phase modulation]]&nbsp; such as QPSK or 8–PSK.}}
+*&nbsp; [[Modulation_Methods/Quadratur–Amplitudenmodulation|"quadrature amplitude modulation"]]&nbsp; $\rm (QAM)$&nbsp; or
+*&nbsp; [[Modulation_Methods/Quadrature_Amplitude_Modulation#Other_signal_space_constellations|"multi-level phase modulation"]]&nbsp; such as&nbsp; $\rm QPSK$&nbsp; or&nbsp; $\rm  8–PSK$.}}
-==Relevant tasks ==
+==Exercises for the chapter ==
 <br>
-[[Aufgaben:4.5 Transinformation aus 2D-WDF|Aufgabe 4.5: Transinformation aus 2D-WDF]]
+[[Aufgaben:Exercise_4.5:_Mutual_Information_from_2D-PDF|Exercise 4.5: Mutual Information from 2D-PDF]]
-[[Aufgaben:4.5Z Nochmals Transinformation|Aufgabe 4.5Z: Nochmals Transinformation]]
+[[Aufgaben:Exercise_4.5Z:_Again_Mutual_Information|Exercise 4.5Z: Again Mutual Information]]
-[[Aufgaben:4.6 AWGN–Kanalkapazität|Aufgabe 4.6: AWGN–Kanalkapazität]]
+[[Aufgaben:Exercise_4.6:_AWGN_Channel_Capacity|Exercise 4.6: AWGN Channel Capacity]]
-[[Aufgaben:4.7 Mehrere parallele Gaußkanäle|Aufgabe 4.7: Mehrere parallele Gaußkanäle]]
+[[Aufgaben:Exercise_4.7:_Several_Parallel_Gaussian_Channels|Exercise 4.7: Several Parallel Gaussian Channels]]
-[[Aufgaben:4.7Z Zum Water–Filling–Algorithmus|Aufgabe 4.7Z: Zum Water–Filling–Algorithmus]]
+[[Aufgaben:Exercise_4.7Z:_About_the_Water_Filling_Algorithm|Exercise 4.7Z: About the Water Filling Algorithm]]
 {{Display}}