Difference between revisions of "Digital Signal Transmission/Structure of the Optimal Receiver"

Latest revision as of 16:18, 23 January 2023

1 Block diagram and prerequisites
2 Fundamental approach to optimal receiver design
3 The irrelevance theorem
4 Some properties of the AWGN channel
5 Description of the AWGN channel by orthonormal basis functions
6 Optimal receiver for the AWGN channel
7 Implementation aspects
8 Probability density function of the received values
9 N-dimensional Gaussian noise
10 Exercises for the chapter

Block diagram and prerequisites

In this chapter, the structure of the optimal receiver of a digital transmission system is derived in very general terms, whereby

the modulation process and further system details are not specified further,

the basis functions and the signal space representation according to the chapter "Signals, Basis Functions and Vector Spaces" are assumed.

General block diagram of a communication system

To the block diagram it is to be noted:

The symbol set size of the source is $M$ and the source symbol set is $\{m_i\}$ with $i = 0$, ... , $M-1$.

Let the corresponding source symbol probabilities ${\rm Pr}(m = m_i)$ also be known to the receiver.

For the transmission, $M$ different signal forms $s_i(t)$ are available; for the indexing variable shall be valid: $i = 0$, ... , $M-1$ .

There is a fixed relation between messages $\{m_i\}$ and signals $\{s_i(t)\}$. If $m =m_i$, the transmitted signal is $s(t) =s_i(t)$.

Linear channel distortions are taken into account in the above graph by the impulse response $h(t)$. In addition, the noise term $n(t)$ (of some kind) is effective.

With these two effects interfering with the transmission, the signal $r(t)$ arriving at the receiver can be given in the following way:

$$r(t) = s(t) \star h(t) + n(t) \hspace{0.05cm}.$$

The task of the $($optimal$)$ receiver is to find out on the basis of its input signal $r(t)$, which of the $M$ possible messages $m_i$ – or which of the signals $s_i(t)$ – was sent. The estimated value for $m$ found by the receiver is characterized by a "circumflex" ⇒ $\hat{m}$.

$\text{Definition:}$ One speaks of an optimal receiver if the symbol error probability assumes the smallest possible value for the boundary conditions:

$$p_{\rm S} = {\rm Pr} ({\cal E}) = {\rm Pr} ( \hat{m} \ne m) \hspace{0.15cm} \Rightarrow \hspace{0.15cm}{\rm minimum} \hspace{0.05cm}.$$

Notes:

In the following, we mostly assume the AWGN approach ⇒ $r(t) = s(t) + n(t)$, which means that $h(t) = \delta(t)$ is assumed to be distortion-free.
Otherwise, we can redefine the signals $s_i(t)$ as ${s_i}'(t) = s_i(t) \star h(t)$, i.e., impose the deterministic channel distortions on the transmitted signal.

Fundamental approach to optimal receiver design

Compared to the "block diagram" shown in the previous section, we now perform some essential generalizations:

Model for deriving the optimal receiver

The transmission channel is described by the "conditional probability density function" $p_{\hspace{0.02cm}r(t)\hspace{0.02cm} \vert \hspace{0.02cm}s(t)}$ which determines the dependence of the received signal $r(t)$ on the transmitted signal $s(t)$.

If a certain signal $r(t) = \rho(t)$ has been received, the receiver has the task to determine the probability density functions based on this "signal realization" $\rho(t)$ and the $M$ conditional probability density functions

$$p_{\hspace{0.05cm}r(t) \hspace{0.05cm} \vert \hspace{0.05cm} s(t) } (\rho(t) \hspace{0.05cm} \vert \hspace{0.05cm} s_i(t))\hspace{0.2cm}{\rm with}\hspace{0.2cm} i = 0, \text{...} \hspace{0.05cm}, M-1.$$

It is to be found out which message $\hat{m}$ was transmitted most probably, taking into account all possible transmitted signals $s_i(t)$ and their occurrence probabilities ${\rm Pr}(m = m_i)$.

Thus, the estimate of the optimal receiver is determined in general by

$$\hat{m} = {\rm arg} \max_i \hspace{0.1cm} p_{\hspace{0.02cm}s(t) \hspace{0.05cm} \vert \hspace{0.05cm} r(t) } ( s_i(t) \hspace{0.05cm} \vert \hspace{0.05cm} \rho(t)) = {\rm arg} \max_i \hspace{0.1cm} p_{m \hspace{0.05cm} \vert \hspace{0.05cm} r(t) } ( \hspace{0.05cm}m_i\hspace{0.05cm} \vert \hspace{0.05cm}\rho(t))\hspace{0.05cm}.$$

$\text{In other words:}$ The optimal receiver considers as the most likely transmitted message $\hat{m} \in \{m_i\}$ whose conditional probability density function $p_{\hspace{0.02cm}m \hspace{0.05cm} \vert \hspace{0.05cm} r(t) }$ takes the largest possible value for the applied received signal $\rho(t)$ and under the assumption $m =\hat{m}$.

Before we discuss the above decision rule in more detail, the optimal receiver should still be divided into two functional blocks according to the diagram:

The detector takes various measurements on the received signal $r(t)$ and summarizes them in the vector $\boldsymbol{r}$. With $K$ measurements, $\boldsymbol{r}$ corresponds to a point in the $K$–dimensional vector space.

The decision forms the estimated value depending on this vector. For a given vector $\boldsymbol{r} = \boldsymbol{\rho}$ the decision rule is:

$$\hat{m} = {\rm arg}\hspace{0.05cm} \max_i \hspace{0.1cm} P_{m\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r} } ( m_i\hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{\rho}) \hspace{0.05cm}.$$

In contrast to the upper decision rule, a conditional probability $P_{m\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r} }$ now occurs instead of the conditional probability density function $\rm (PDF)$ $p_{m\hspace{0.05cm} \vert \hspace{0.05cm}r(t)}$. Please note the upper and lower case for the different meanings.

$\text{Example 1:}$ We now consider the function $y = {\rm arg}\hspace{0.05cm} \max \ p(x)$, where $p(x)$ describes the probability density function $\rm (PDF)$ of a continuous-valued or discrete-valued random variable $x$. In the second case (right graph), the PDF consists of a sum of Dirac delta functions with the probabilities as pulse weights.

Illustration of the "arg max" function

⇒ The graphic shows exemplary functions. In both cases the PDF maximum $(17)$ is at $x = 6$:

$$\max_i \hspace{0.1cm} p(x) = 17\hspace{0.05cm},$$

$$y = {\rm \hspace{0.05cm}arg} \max_i \hspace{0.1cm} p(x) = 6\hspace{0.05cm}.$$

⇒ The (conditional) probabilities in the equation

$$\hat{m} = {\rm arg}\hspace{0.05cm} \max_i \hspace{0.1cm} P_{\hspace{0.02cm}m\hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{ r} } ( m_i \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho})$$

are a-posteriori probabilities. "Bayes' theorem" can be used to write for this:

$$P_{\hspace{0.02cm}m\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r} } ( m_i \hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{\rho}) = \frac{ {\rm Pr}( m_i) \cdot p_{\boldsymbol{ r}\hspace{0.05cm} \vert \hspace{0.05cm}m } (\boldsymbol{\rho}\hspace{0.05cm} \vert \hspace{0.05cm}m_i )}{p_{\boldsymbol{ r} } (\boldsymbol{\rho})} \hspace{0.05cm}.$$

The denominator term $p_{\boldsymbol{ r} }(\boldsymbol{\rho})$ is the same for all alternatives $m_i$ and need not be considered for the decision.

This gives the following rules:

$\text{Theorem:}$ The decision rule of the optimal receiver, also known as maximum–a–posteriori receiver $($in short: MAP receiver$)$ is:

$$\hat{m}_{\rm MAP} = {\rm \hspace{0.05cm} arg} \max_i \hspace{0.1cm} P_{\hspace{0.02cm}m\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r} } ( m_i \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}) = {\rm \hspace{0.05cm}arg} \max_i \hspace{0.1cm} \big [ {\rm Pr}( m_i) \cdot p_{\boldsymbol{ r}\hspace{0.05cm} \vert \hspace{0.05cm} m } (\boldsymbol{\rho}\hspace{0.05cm} \vert \hspace{0.05cm} m_i )\big ]\hspace{0.05cm}.$$

The advantage of this equation is that the conditional PDF $p_{\boldsymbol{ r}\hspace{0.05cm} \vert \hspace{0.05cm} m }$ $($"output under the condition input"$)$ describing the forward direction of the channel can be used.

In contrast, the first equation uses the inference probabilities $P_{\hspace{0.05cm}m\hspace{0.05cm} \vert \hspace{0.02cm} \boldsymbol{ r} } $ $($"input under the condition output"$)$.

$\text{Theorem:}$ A maximum likelihood receiver $($in short: ML receiver$)$ uses the following decision rule:

$$\hat{m}_{\rm ML} = \hspace{-0.1cm} {\rm arg} \max_i \hspace{0.1cm} p_{\boldsymbol{ r}\hspace{0.05cm} \vert \hspace{0.05cm}m } (\boldsymbol{\rho}\hspace{0.05cm} \vert \hspace{0.05cm}m_i )\hspace{0.05cm}.$$

In this case, the possibly different occurrence probabilities ${\rm Pr}(m = m_i)$ are not used for the decision process.
For example, because they are not known to the receiver.

See the earlier chapter "Optimal Receiver Strategies" for other derivations for these receiver types.

$\text{Conclusion:}$ For equally likely messages $\{m_i\}$ ⇒ ${\rm Pr}(m = m_i) = 1/M$, the generally slightly worse "maximum likelihood receiver" is equivalent to the "maximum–a–posteriori receiver":

$$\hat{m}_{\rm MAP} = \hat{m}_{\rm ML} =\hspace{-0.1cm} {\rm\hspace{0.05cm} arg} \max_i \hspace{0.1cm} p_{\boldsymbol{ r}\hspace{0.05cm} \vert \hspace{0.05cm}m } (\boldsymbol{\rho}\hspace{0.05cm} \vert \hspace{0.05cm}m_i )\hspace{0.05cm}.$$

The irrelevance theorem

About the irrelevance theorem

Note that the receiver described in the last section is optimal only if the detector is implemented in the best possible way, if no information is lost by the transition from the continuous signal $r(t)$ to the vector $\boldsymbol{r}$.

To clarify the question which and how many measurements have to be performed on the received signal $r(t)$ to guarantee optimality, the "irrelevance theorem" is helpful.

For this purpose, we consider the sketched receiver whose detector derives the two vectors $\boldsymbol{r}_1$ and $\boldsymbol{r}_2$ from the received signal $r(t)$ and makes them available to the decision.

These quantities are related to the message $ m \in \{m_i\}$ via the composite probability density $p_{\boldsymbol{ r}_1, \hspace{0.05cm}\boldsymbol{ r}_2\hspace{0.05cm} \vert \hspace{0.05cm}m }$.

The decision rule of the MAP receiver with adaptation to this example is:

$$\hat{m}_{\rm MAP} \hspace{-0.1cm} = \hspace{-0.1cm} {\rm arg} \max_i \hspace{0.1cm} \big [ {\rm Pr}( m_i) \cdot p_{\boldsymbol{ r}_1 , \hspace{0.05cm}\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm}m } \hspace{0.05cm} (\boldsymbol{\rho}_1, \hspace{0.05cm}\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm} m_i ) \big]= {\rm arg} \max_i \hspace{0.1cm}\big [ {\rm Pr}( m_i) \cdot p_{\boldsymbol{ r}_1 \hspace{0.05cm} \vert \hspace{0.05cm}m } \hspace{0.05cm} (\boldsymbol{\rho}_1 \hspace{0.05cm} \vert \hspace{0.05cm}m_i ) \cdot p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r}_1 , \hspace{0.05cm} m } \hspace{0.05cm} (\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1 , \hspace{0.05cm}m_i )\big] \hspace{0.05cm}.$$

Here it is to be noted:

The vectors $\boldsymbol{r}_1$ and $\boldsymbol{r}_2$ are random variables. Their realizations are denoted here and in the following by $\boldsymbol{\rho}_1$ and $\boldsymbol{\rho}_2$. For emphasis, all vectors are red inscribed in the graph.

The requirements for the application of the "irrelevance theorem" are the same as those for a first order "Markov chain". The random variables $x$, $y$, $z$ then form a first order Markov chain if the distribution of $z$ is independent of $x$ for a given $y$. The first order Markov chain is the following:

$$p(x,\ y,\ z) = p(x) \cdot p(y\hspace{0.05cm} \vert \hspace{0.05cm}x) \cdot p(z\hspace{0.05cm} \vert \hspace{0.05cm}y) \hspace{0.75cm} {\rm instead \hspace{0.15cm}of} \hspace{0.75cm}p(x, y, z) = p(x) \cdot p(y\hspace{0.05cm} \vert \hspace{0.05cm}x) \cdot p(z\hspace{0.05cm} \vert \hspace{0.05cm}x, y) \hspace{0.05cm}.$$

In the general case, the optimal receiver must evaluate both vectors $\boldsymbol{r}_1$ and $\boldsymbol{r}_2$, since both composite probability densities $p_{\boldsymbol{ r}_1\hspace{0.05cm} \vert \hspace{0.05cm}m }$ and $p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{ r}_1, \hspace{0.05cm}m }$ occur in the above decision rule. In contrast, the receiver can neglect the second measurement without loss of information if $\boldsymbol{r}_2$ is independent of the message $m$ for given $\boldsymbol{r}_1$:

$$p_{\boldsymbol{ r}_2\hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{ r}_1 , \hspace{0.05cm} m } \hspace{0.05cm} (\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1 , \hspace{0.05cm}m_i )= p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r}_1 } \hspace{0.05cm} (\boldsymbol{\rho}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1 ) \hspace{0.05cm}.$$

In this case, the decision rule can be further simplified:

$$\hat{m}_{\rm MAP} = {\rm arg} \max_i \hspace{0.1cm} \big [ {\rm Pr}( m_i) \cdot p_{\boldsymbol{ r}_1 \hspace{0.05cm} \vert \hspace{0.05cm}m } \hspace{0.05cm} (\boldsymbol{\rho}_1 \hspace{0.05cm} \vert \hspace{0.05cm}m_i ) \cdot p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r}_1 , \hspace{0.05cm} m } \hspace{0.05cm} (\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1 , \hspace{0.05cm}m_i ) \big]$$

$$\Rightarrow \hspace{0.3cm}\hat{m}_{\rm MAP} = {\rm arg} \max_i \hspace{0.1cm} \big [ {\rm Pr}( m_i) \cdot p_{\boldsymbol{ r}_1 \hspace{0.05cm} \vert \hspace{0.05cm}m } \hspace{0.05cm} (\boldsymbol{\rho}_1 \hspace{0.05cm} \vert \hspace{0.05cm}m_i ) \cdot p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r}_1 } \hspace{0.05cm} (\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1 )\big]$$

$$\Rightarrow \hspace{0.3cm}\hat{m}_{\rm MAP} = {\rm arg} \max_i \hspace{0.1cm} \big [ {\rm Pr}( m_i) \cdot p_{\boldsymbol{ r}_1 \hspace{0.05cm} \vert \hspace{0.05cm}m } \hspace{0.05cm} (\boldsymbol{\rho}_1 \hspace{0.05cm} \vert \hspace{0.05cm}m_i ) \big]\hspace{0.05cm}.$$

Two examples of the irrelevance theorem

$\text{Example 2:}$ We consider two different system configurations with two noise terms $\boldsymbol{ n}_1$ and $\boldsymbol{ n}_2$ each to illustrate the irrelevance theorem just presented.

In the diagram all vectorial quantities are red inscribed.

Moreover, red inscribed the quantities $\boldsymbol{s}$, $\boldsymbol{ n}_1$ and $\boldsymbol{ n}_2$ are independent of each other.

The analysis of these two arrangements yields the following results:

In both cases, the decision must consider the component $\boldsymbol{ r}_1= \boldsymbol{ s}_i + \boldsymbol{ n}_1$, since only this component provides the information about the possible transmitted signals $\boldsymbol{ s}_i$ and thus about the message $m_i$.

In the upper configuration, $\boldsymbol{ r}_2$ contains no information about $m_i$ that has not already been provided by $\boldsymbol{ r}_1$. Rather, $\boldsymbol{ r}_2= \boldsymbol{ r}_1 + \boldsymbol{ n}_2$ is just a noisy version of $\boldsymbol{ r}_1$ and depends only on the noise $\boldsymbol{ n}_2$ once $\boldsymbol{ r}_1$ is known ⇒ $\boldsymbol{ r}_2$ is irrelevant:

$$p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r}_1 , \hspace{0.05cm} m } \hspace{0.05cm} (\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1 , \hspace{0.05cm}m_i )= p_{\boldsymbol{ r}_2\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r}_1 } \hspace{0.05cm} (\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{\rho}_1 )= p_{\boldsymbol{ n}_2 } \hspace{0.05cm} (\boldsymbol{\rho}_2 - \boldsymbol{\rho}_1 )\hspace{0.05cm}.$$

In the lower configuration, on the other hand, $\boldsymbol{ r}_2= \boldsymbol{ n}_1 + \boldsymbol{ n}_2$ is helpful to the receiver, since it provides it with an estimate of the noise term $\boldsymbol{ n}_1$ ⇒ $\boldsymbol{ r}_2$ should therefore not be discarded here.

Formally, this result can be expressed as follows:

$$p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r}_1 , \hspace{0.05cm} m } \hspace{0.05cm} (\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1 , \hspace{0.05cm}m_i ) = p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ n}_1 , \hspace{0.05cm} m } \hspace{0.05cm} (\boldsymbol{\rho}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1 - \boldsymbol{s}_i, \hspace{0.05cm}m_i)= p_{\boldsymbol{ n}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ n}_1 , \hspace{0.05cm} m } \hspace{0.05cm} (\boldsymbol{\rho}_2- \boldsymbol{\rho}_1 + \boldsymbol{s}_i \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1 - \boldsymbol{s}_i, \hspace{0.05cm}m_i) = p_{\boldsymbol{ n}_2 } \hspace{0.05cm} (\boldsymbol{\rho}_2- \boldsymbol{\rho}_1 + \boldsymbol{s}_i ) \hspace{0.05cm}.$$

Since the possible transmitted signal $\boldsymbol{ s}_i$ now appears in the argument of this function, $\boldsymbol{ r}_2$ is not irrelevant, but quite relevant.

Some properties of the AWGN channel

In order to make further statements about the nature of the optimal measurements of the vector $\boldsymbol{ r}$, it is necessary to further specify the (conditional) probability density function $p_{\hspace{0.02cm}r(t)\hspace{0.05cm} \vert \hspace{0.05cm}s(t)}$ characterizing the channel. In the following, we will consider communication over the "AWGN channel", whose most important properties are briefly summarized again here:

The output signal of the AWGN channel is $r(t) = s(t)+n(t)$, where $s(t)$ indicates the transmitted signal and $n(t)$ is represented by a Gaussian noise process.

A random process $\{n(t)\}$ is said to be Gaussian if the elements of the $k$–dimensional random variables $\{n_1(t)\hspace{0.05cm} \text{...} \hspace{0.05cm}n_k(t)\}$ are "jointly Gaussian".

The mean value of the AWGN noise is ${\rm E}\big[n(t)\big] = 0$. Moreover, $n(t)$ is "white", which means that the "power-spectral density" $\rm (PSD)$ is constant for all frequencies $($from $-\infty$ to $+\infty)$:

$${\it \Phi}_n(f) = {N_0}/{2} \hspace{0.05cm}.$$

According to the "Wiener-Khintchine theorem", the auto-correlation function $\rm (ACF)$ is obtained as the "Fourier retransform" of ${\it \Phi_n(f)}$:

$${\varphi_n(\tau)} = {\rm E}\big [n(t) \cdot n(t+\tau)\big ] = {N_0}/{2} \cdot \delta(t)\hspace{0.3cm} \Rightarrow \hspace{0.3cm} {\rm E}\big [n(t) \cdot n(t+\tau)\big ] = \left\{ \begin{array}{c} \rightarrow \infty \\ 0 \end{array} \right.\quad \begin{array}{*{1}c} {\rm f{or}} \hspace{0.15cm} \tau = 0 \hspace{0.05cm}, \\ {\rm f{or}} \hspace{0.15cm} \tau \ne 0 \hspace{0.05cm},\\ \end{array}$$

Here, $N_0$ indicates the physical noise power density $($defined only for $f \ge 0)$. The constant PSD value $(N_0/2)$ and the weight of the Dirac delta function in the ACF $($also $N_0/2)$ result from the two-sided approach alone.

⇒ More information on this topic is provided by the (German language) learning video "The AWGN channel" in part two.

Description of the AWGN channel by orthonormal basis functions

From the penultimate statement in the last section, we see that

pure AWGN noise $n(t)$ always has infinite variance (power): $\sigma_n^2 \to \infty$,

consequently, in reality only filtered noise $n\hspace{0.05cm}'(t) = n(t) \star h_n(t)$ can occur.

With the impulse response $h_n(t)$ and the frequency response $H_n(f) = {\rm F}\big [h_n(t)\big ]$, the following equations hold:

$${\rm E}\big[n\hspace{0.05cm}'(t) \big] \hspace{0.15cm} = \hspace{0.2cm} {\rm E}\big[n(t) \big] = 0 \hspace{0.05cm},$$

$${\it \Phi_{n\hspace{0.05cm}'}(f)} \hspace{0.1cm} = \hspace{0.1cm} {N_0}/{2} \cdot |H_{n}(f)|^2 \hspace{0.05cm},$$

$$ {\it \varphi_{n\hspace{0.05cm}'}(\tau)} \hspace{0.1cm} = \hspace{0.1cm} {N_0}/{2}\hspace{0.1cm} \cdot \big [h_{n}(\tau) \star h_{n}(-\tau)\big ]\hspace{0.05cm},$$

$$\sigma_n^2 \hspace{0.1cm} = \hspace{0.1cm} { \varphi_{n\hspace{0.05cm}'}(\tau = 0)} = {N_0}/{2} \cdot \int_{-\infty}^{+\infty}h_n^2(t)\,{\rm d} t ={N_0}/{2}\hspace{0.1cm} \cdot < \hspace{-0.1cm}h_n(t), \hspace{0.1cm} h_n(t) \hspace{-0.05cm} > \hspace{0.1cm} $$

$$\Rightarrow \hspace{0.3cm} \sigma_n^2 \hspace{0.1cm} = \int_{-\infty}^{+\infty}{\it \Phi}_{n\hspace{0.05cm}'}(f)\,{\rm d} f = {N_0}/{2} \cdot \int_{-\infty}^{+\infty}|H_n(f)|^2\,{\rm d} f \hspace{0.05cm}.$$

In the following, $n(t)$ always implicitly includes a "band limitation"; thus, the notation $n'(t)$ will be omitted in the future.

$\text{Please note:}$ Similar to the transmitted signal $s(t)$, the noise process $\{n(t)\}$ can be written as a weighted sum of orthonormal basis functions $\varphi_j(t)$.

In contrast to $s(t)$, however, a restriction to a finite number of basis functions is not possible.

Rather, for purely stochastic quantities, the following always holds for the corresponding signal representation

$$n(t) = \lim_{N \rightarrow \infty} \sum\limits_{j = 1}^{N}n_j \cdot \varphi_j(t) \hspace{0.05cm},$$

where the coefficient $n_j$ is determined by the projection of $n(t)$ onto the basis function $\varphi_j(t)$:

$$n_j = \hspace{0.1cm} < \hspace{-0.1cm}n(t), \hspace{0.1cm} \varphi_j(t) \hspace{-0.05cm} > \hspace{0.05cm}.$$

Note: To avoid confusion with the basis functions $\varphi_j(t)$, we will in the following express the auto-correlation function $\rm (ACF)$ $\varphi_n(\tau)$ of the noise process only as the expected value

$${\rm E}\big [n(t) \cdot n(t + \tau)\big ] \equiv \varphi_n(\tau) .$$

Optimal receiver for the AWGN channel

Optimal receiver at the AWGN channel

The received signal $r(t) = s(t) + n(t)$ can also be decomposed into basis functions in a well-known way: $$r(t) = \sum\limits_{j = 1}^{\infty}r_j \cdot \varphi_j(t) \hspace{0.05cm}.$$

To be considered:

The $M$ possible transmitted signals $\{s_i(t)\}$ span a signal space with a total of $N$ basis functions $\varphi_1(t)$, ... , $\varphi_N(t)$.

These $N$ basis functions $\varphi_j(t)$ are used simultaneously to describe the noise signal $n(t)$ and the received signal $r(t)$.

For a complete characterization of $n(t)$ or $r(t)$, however, an infinite number of further basis functions $\varphi_{N+1}(t)$, $\varphi_{N+2}(t)$, ... are needed.

Thus, the coefficients of the received signal $r(t)$ are obtained according to the following equation, taking into account that the signals $s_i(t)$ and the noise $n(t)$ are independent of each other:

$$r_j \hspace{0.1cm} = \hspace{0.1cm} \hspace{0.1cm} < \hspace{-0.1cm}r(t), \hspace{0.1cm} \varphi_j(t) \hspace{-0.05cm} > \hspace{0.1cm}=\hspace{0.1cm} \left\{ \begin{array}{c} < \hspace{-0.1cm}s_i(t), \hspace{0.1cm} \varphi_j(t) \hspace{-0.05cm} > + < \hspace{-0.1cm}n(t), \hspace{0.1cm} \varphi_j(t) \hspace{-0.05cm} > \hspace{0.1cm}= s_{ij}+ n_j\\ < \hspace{-0.1cm}n(t), \hspace{0.1cm} \varphi_j(t) \hspace{-0.05cm} > \hspace{0.1cm} = n_j \end{array} \right.\quad \begin{array}{*{1}c} {j = 1, 2, \hspace{0.05cm}\text{...}\hspace{0.05cm} \hspace{0.05cm}, N} \hspace{0.05cm}, \\ {j > N} \hspace{0.05cm}.\\ \end{array}$$

Thus, the structure sketched above results for the optimal receiver.

Let us first consider the AWGN channel. Here, the prefilter with the frequency response $W(f)$, which is intended for colored noise, can be dispensed with.

The detector of the optimal receiver forms the coefficients $r_j \hspace{0.1cm} = \hspace{0.1cm} \hspace{0.1cm} < \hspace{-0.1cm}r(t), \hspace{0.1cm} \varphi_j(t)\hspace{-0.05cm} >$ and passes them on to the decision.
If the decision is based on all $($i.e., infinitely many$)$ coefficients $r_j$, the probability of a wrong decision is minimal and the receiver is optimal.
The real-valued coefficients $r_j$ were calculated as follows:

$$r_j = \left\{ \begin{array}{c} s_{ij} + n_j\\ n_j \end{array} \right.\quad \begin{array}{*{1}c} {j = 1, 2, \hspace{0.05cm}\text{...}\hspace{0.05cm}, N} \hspace{0.05cm}, \\ {j > N} \hspace{0.05cm}.\\ \end{array}$$

According to the "irrelevance theorem" it can be shown that for additive white Gaussian noise

the optimality is not lowered if the coefficients $r_{N+1}$, $r_{N+2}$, ... , that do not depend on the message $(s_{ij})$, are not included in the decision process, and therefore

the detector has to form only the projections of the received signal $r(t)$ onto the $N$ basis functions $\varphi_{1}(t)$, ... , $\varphi_{N}(t)$ given by the useful signal $s(t)$.

In the graph this significant simplification is indicated by the gray background.

In the case of colored noise ⇒ power-spectral density ${\it \Phi}_n(f) \ne {\rm const.}$, only an additional prefilter with the amplitude response $|W(f)| = {1}/{\sqrt{\it \Phi}_n(f)}$ is required.

This filter is called "whitening filter", because the noise power-spectral density at the output is constant again ⇒ "white".
More details can be found in the chapter "Matched filter for colored interference" of the book "Stochastic Signal Theory".

Implementation aspects

Essential components of the optimal receiver are the calculations of the inner products according to the equations $r_j \hspace{0.1cm} = \hspace{0.1cm} \hspace{0.1cm} < \hspace{-0.1cm}r(t), \hspace{0.1cm} \varphi_j(t) \hspace{-0.05cm} >$.

$\text{These can be implemented in several ways:}$

In the correlation receiver $($see the "chapter of the same name" for more details on this implementation$)$, the inner products are realized directly according to the definition with analog multipliers and integrators:

$$r_j = \int_{-\infty}^{+\infty}r(t) \cdot \varphi_j(t) \,{\rm d} t \hspace{0.05cm}.$$

The matched filter receiver, already derived in the chapter "Optimal Binary Receiver" at the beginning of this book, achieves the same result using a linear filter with impulse response $h_j(t) = \varphi_j(t) \cdot (T-t)$ followed by sampling at time $t = T$:

$$r_j = \int_{-\infty}^{+\infty}r(\tau) \cdot h_j(t-\tau) \,{\rm d} \tau = \int_{-\infty}^{+\infty}r(\tau) \cdot \varphi_j(T-t+\tau) \,{\rm d} \tau \hspace{0.3cm} \Rightarrow \hspace{0.3cm} r_j (t = \tau) = \int_{-\infty}^{+\infty}r(\tau) \cdot \varphi_j(\tau) \,{\rm d} \tau = r_j \hspace{0.05cm}.$$

Three different implementations of the inner product

The figure shows the two possible realizations
of the optimal detector.

Probability density function of the received values

Before we turn to the optimal design of the decision maker and the calculation and approximation of the error probability in the following chapter, we first perform a statistical analysis of the decision variables $r_j$ valid for the AWGN channel.

Signal space constellation (left) and PDF of the received signal (right)

For this purpose, we consider again the optimal binary receiver for bipolar baseband transmission over the AWGN channel, starting from the description form valid for the fourth main chapter.

With the parameters $N = 1$ and $M = 2$, the signal space constellation shown in the left graph is obtained for the transmitted signal

with only one basis function $\varphi_1(t)$, because of $N = 1$,

with the two signal space points $s_i \in \{s_0, \hspace{0.05cm}s_1\}$, because of $M = 2$.

For the signal $r(t) = s(t) + n(t)$ at the AWGN channel output, the noise-free case ⇒ $r(t) = s(t)$ yields exactly the same constellation; The signal space points are at

$$r_0 = s_0 = \sqrt{E}\hspace{0.05cm},\hspace{0.2cm}r_1 = s_1 = -\sqrt{E}\hspace{0.05cm}.$$

Considering the (band-limited) AWGN noise $n(t)$,

Gaussian curves with variance $\sigma_n^2$ ⇒ standard deviation $\sigma_n$ are superimposed on each of the two points $r_0$ and $r_1$ $($see right sketch$)$.

The probability density function $\rm (PDF)$ of the noise component $n(t)$ is thereby:

$$p_n(n) = \frac{1}{\sqrt{2\pi} \cdot \sigma_n}\cdot {\rm e}^{ - {n^2}/(2 \sigma_n^2)}\hspace{0.05cm}.$$

The following expression is then obtained for the conditional probability density that the received value $\rho$ is present when $s_i$ has been transmitted:

$$p_{\hspace{0.02cm}r\hspace{0.05cm}|\hspace{0.05cm}s}(\rho\hspace{0.05cm}|\hspace{0.05cm}s_i) = \frac{1}{\sqrt{2\pi} \cdot \sigma_n}\cdot {\rm e}^{ - {(\rho - s_i)^2}/(2 \sigma_n^2)} \hspace{0.05cm}.$$

Regarding the units of the quantities listed here, we note:

$r_0 = s_0$ and $r_1 = s_1$ as well as $n$ are each scalars with the unit "root of energy".

Thus, it is obvious that $\sigma_n$ also has the unit "root of energy" and $\sigma_n^2$ represents energy.

For the AWGN channel, the noise variance is $\sigma_n^2 = N_0/2$, so this is also a physical quantity with unit "$\rm W/Hz \equiv Ws$".

The topic addressed here is illustrated by examples in "Exercise 4.6".

N-dimensional Gaussian noise

If an $N$–dimensional modulation process is present, i.e., with $0 \le i \le M–1$ and $1 \le j \le N$:

$$s_i(t) = \sum\limits_{j = 1}^{N} s_{ij} \cdot \varphi_j(t) = s_{i1} \cdot \varphi_1(t) + s_{i2} \cdot \varphi_2(t) + \hspace{0.05cm}\text{...}\hspace{0.05cm} + s_{iN} \cdot \varphi_N(t)\hspace{0.05cm}\hspace{0.3cm} \Rightarrow \hspace{0.3cm} \boldsymbol{ s}_i = \left(s_{i1}, s_{i2}, \hspace{0.05cm}\text{...}\hspace{0.05cm}, s_{iN}\right ) \hspace{0.05cm},$$

then the noise vector $\boldsymbol{ n}$ must also be assumed to have dimension $N$. The same is true for the received vector $\boldsymbol{ r}$:

$$\boldsymbol{ n} = \left(n_{1}, n_{2}, \hspace{0.05cm}\text{...}\hspace{0.05cm}, n_{N}\right ) \hspace{0.01cm},$$

$$\boldsymbol{ r} = \left(r_{1}, r_{2}, \hspace{0.05cm}\text{...}\hspace{0.05cm}, r_{N}\right )\hspace{0.05cm}.$$

The probability density function $\rm (PDF)$ for the AWGN channel is with the realization $\boldsymbol{ \eta}$ of the noise signal

$$p_{\boldsymbol{ n}}(\boldsymbol{ \eta}) = \frac{1}{\left( \sqrt{2\pi} \cdot \sigma_n \right)^N } \cdot {\rm exp} \left [ - \frac{|| \boldsymbol{ \eta} ||^2}{2 \sigma_n^2}\right ]\hspace{0.05cm},$$

and for the conditional PDF in the maximum likelihood decision rule:

$$p_{\hspace{0.02cm}\boldsymbol{ r}\hspace{0.05cm} | \hspace{0.05cm} \boldsymbol{ s}}(\boldsymbol{ \rho} \hspace{0.05cm}|\hspace{0.05cm} \boldsymbol{ s}_i) \hspace{-0.1cm} = \hspace{0.1cm} p_{\hspace{0.02cm} \boldsymbol{ n}\hspace{0.05cm} | \hspace{0.05cm} \boldsymbol{ s}}(\boldsymbol{ \rho} - \boldsymbol{ s}_i \hspace{0.05cm} | \hspace{0.05cm} \boldsymbol{ s}_i) = \frac{1}{\left( \sqrt{2\pi} \cdot \sigma_n \right)^2 } \cdot {\rm exp} \left [ - \frac{|| \boldsymbol{ \rho} - \boldsymbol{ s}_i ||^2}{2 \sigma_n^2}\right ]\hspace{0.05cm}.$$

The equation follows

from the general representation of the $N$–dimensional Gaussian PDF in the section "correlation matrix" of the book "Theory of Stochastic Signals"

under the assumption that the components are uncorrelated (and thus statistically independent).

$||\boldsymbol{ \eta}||$ is called the "norm" (length) of the vector $\boldsymbol{ \eta}$.

$\text{Example 3:}$ Shown on the right is the two-dimensional Gaussian probability density function $p_{\boldsymbol{ n} } (\boldsymbol{ \eta})$ of the two-dimensional random variable $\boldsymbol{ n} = (n_1,\hspace{0.05cm}n_2)$. Arbitrary realizations of the random variable $\boldsymbol{ n}$ are denoted by $\boldsymbol{ \eta} = (\eta_1,\hspace{0.05cm}\eta_2)$. The equation of the represented two-dimensional "Gaussian bell curve" is:

Two-dimensional Gaussian PDF

$$p_{n_1, n_2}(\eta_1, \eta_2) = \frac{1}{\left( \sqrt{2\pi} \cdot \sigma_n \right)^2 } \cdot {\rm exp} \left [ - \frac{ \eta_1^2 + \eta_2^2}{2 \sigma_n^2}\right ]\hspace{0.05cm}. $$

The maximum of this function is at $\eta_1 = \eta_2 = 0$ and has the value $2\pi \cdot \sigma_n^2$. With $\sigma_n^2 = N_0/2$, the two-dimensional PDF in vector form can also be written as follows:

$$p_{\boldsymbol{ n} }(\boldsymbol{ \eta}) = \frac{1}{\pi \cdot N_0 } \cdot {\rm exp} \left [ - \frac{\vert \vert \boldsymbol{ \eta} \vert \vert ^2}{N_0}\right ]\hspace{0.05cm}.$$

This rotationally symmetric PDF is suitable e.g. for describing/investigating a "two-dimensional modulation process" such as "M–QAM", "M–PSK" or "2–FSK".

However, two-dimensional real random variables are often represented in a one-dimensional complex way, usually in the form $n(t) = n_{\rm I}(t) + {\rm j} \cdot n_{\rm Q}(t)$. The two components are then called the "in-phase component" $n_{\rm I}(t)$ and the "quadrature component" $n_{\rm Q}(t)$ of the noise.

The probability density function depends only on the magnitude $\vert n(t) \vert$ of the noise variable and not on angle ${\rm arc} \ n(t)$. This means: complex noise is circularly symmetric $($see graph$)$.

Circularly symmetric also means that the in-phase component $n_{\rm I}(t)$ and the quadrature component $n_{\rm Q}(t)$ have the same distribution and thus also the same variance $($and standard deviation$)$:

$$ {\rm E} \big [ n_{\rm I}^2(t)\big ] = {\rm E}\big [ n_{\rm Q}^2(t) \big ] = \sigma_n^2 \hspace{0.05cm},\hspace{1cm}{\rm E}\big [ n(t) \cdot n^*(t) \big ]\hspace{0.1cm} = \hspace{0.1cm} {\rm E}\big [ n_{\rm I}^2(t) \big ] + {\rm E}\big [ n_{\rm Q}^2(t)\big ] = 2\sigma_n^2 \hspace{0.05cm}.$$

Finally, some denotation variants for Gaussian random variables:

$$x ={\cal N}(\mu, \sigma^2) \hspace{-0.1cm}: \hspace{0.3cm}\text{real Gaussian distributed random variable, with mean}\hspace{0.1cm}\mu \text { and variance}\hspace{0.15cm}\sigma^2 \hspace{0.05cm},$$

$$y={\cal CN}(\mu, \sigma^2)\hspace{-0.1cm}: \hspace{0.12cm}\text{complex Gaussian distributed random variable} \hspace{0.05cm}.$$

Exercises for the chapter

Exercise 4.4: Maximum–a–posteriori and Maximum–Likelihood

Exercise 4.5: Irrelevance Theorem

@@ Line 8: / Line 8: @@
 == Block diagram and prerequisites ==
 <br>
-In this chapter, the structure of the optimal receiver of a digital transmission system is derived in very general terms, whereby
+In this chapter,&nbsp; the structure of the optimal receiver of a digital transmission system is derived in very general terms,&nbsp; whereby
 *the modulation process and further system details are not specified further,<br>
 *the basis functions and the signal space representation according to the chapter&nbsp; [[Digital_Signal_Transmission/Signals,_Basis_Functions_and_Vector_Spaces|"Signals, Basis Functions and Vector Spaces"]]&nbsp; are assumed.
+[[File:EN_Dig_T_4_2_S1.png|right|frame|General block diagram of a communication system|class=fit]]
+<br>
+To the block diagram it is to be noted:
+*The symbol set size of the source is&nbsp; $M$&nbsp; and the source symbol set is&nbsp; $\{m_i\}$&nbsp; with&nbsp; $i = 0$, ... , $M-1$.&nbsp;
-[[File:EN_Dig_T_4_2_S1.png|center|frame|General block diagram of a communication system|class=fit]]
+*Let the corresponding source symbol probabilities&nbsp; ${\rm Pr}(m = m_i)$&nbsp; also be known to the receiver.<br>
-To the above block diagram it is to be noted:
+*For the transmission,&nbsp; $M$&nbsp; different signal forms&nbsp; $s_i(t)$&nbsp; are available;&nbsp; for the indexing variable shall be valid: &nbsp; $i = 0$, ... , $M-1$&nbsp;.&nbsp;
-*The symbol set size of the source is&nbsp; $M$&nbsp; and the symbol set is&nbsp; $\{m_i\}$ with $i = 0$, ... , $M-1$. Let the corresponding symbol probabilities&nbsp; ${\rm Pr}(m = m_i)$&nbsp; also be known to the receiver.<br>
-*For message transmission&nbsp; $M$&nbsp; different signal forms&nbsp; $s_i(t)$&nbsp; are available; for the indexing variable the indexing&nbsp; $i = 0$, ... , $M-1$&nbsp; shall be valid. There is a fixed relation between the messages&nbsp; $\{m_i\}$&nbsp; and the signals&nbsp; $\{s_i(t)\}$. If&nbsp; $m =m_i$&nbsp; is transmitted, the transmitted signal is&nbsp; $s(t) =s_i(t)$.<br>
+*There is a fixed relation between messages&nbsp; $\{m_i\}$&nbsp; and signals&nbsp; $\{s_i(t)\}$.&nbsp; If&nbsp; $m =m_i$,&nbsp; the transmitted signal is&nbsp; $s(t) =s_i(t)$.<br>
-*Linear channel distortions are taken into account in the above graph by the impulse response&nbsp; $h(t)$.&nbsp; In addition, a noise&nbsp; $n(t)$&nbsp; (of some kind) is effective. With these two effects interfering with the transmission, the signal&nbsp; $r(t)$&nbsp; arriving at the receiver can be given in the following way:
+*Linear channel distortions are taken into account in the above graph by the impulse response&nbsp; $h(t)$. &nbsp; In addition,&nbsp; the noise term&nbsp; $n(t)$&nbsp; (of some kind)&nbsp; is effective.&nbsp;
+*With these two effects interfering with the transmission, the signal&nbsp; $r(t)$&nbsp; arriving at the receiver can be given in the following way:
 :$$r(t) = s(t) \star h(t) + n(t) \hspace{0.05cm}.$$
-*The task of the (optimal) receiver is to find out, on the basis of its input signal&nbsp; $r(t)$,&nbsp; which of the&nbsp; $M$&nbsp; possible messages&nbsp; $m_i$&nbsp; &ndash; or which of the signals&nbsp; $s_i(t)$&nbsp; &ndash; was sent. The estimated value for&nbsp; $m$&nbsp; found by the receiver is characterized by a circumflex (French: ''Circonflexe'') &nbsp; &rArr; &nbsp;  $\hat{m}$.
+*The task of the&nbsp; $($optimal$)$&nbsp; receiver is to find out on the basis of its input signal&nbsp; $r(t)$,&nbsp; which of the&nbsp; $M$&nbsp; possible messages&nbsp; $m_i$&nbsp; &ndash; or which of the signals&nbsp; $s_i(t)$&nbsp; &ndash; was sent. The estimated value for&nbsp; $m$&nbsp; found by the receiver is characterized by a&nbsp; "circumflex"  &nbsp; &rArr; &nbsp;  $\hat{m}$.
 {{BlaueBox|TEXT=
-$\text{Definition:}$&nbsp; One speaks of an '''optimal receiver''' if the symbol error probability assumes the smallest possible value for the boundary conditions:
+$\text{Definition:}$&nbsp; One speaks of an&nbsp; '''optimal receiver'''&nbsp; if the symbol error probability assumes the smallest possible value for the boundary conditions:
 :$$p_{\rm S} = {\rm Pr}  ({\cal E}) = {\rm Pr} ( \hat{m} \ne m) \hspace{0.15cm} \Rightarrow \hspace{0.15cm}{\rm minimum}   \hspace{0.05cm}.$$}}
-<i>Notes:</i>
+<u>Notes:</u>
-#In the following, we mostly assume the AWGN approach &nbsp; &rArr; &nbsp;  $r(t) =  s(t) + n(t)$,&nbsp; which means that &nbsp;$h(t) =  \delta(t)$&nbsp; is assumed to be distortion-free.
+#In the following,&nbsp; we mostly assume the AWGN approach &nbsp; &rArr; &nbsp;  $r(t) =  s(t) + n(t)$,&nbsp; which means that &nbsp;$h(t) =  \delta(t)$&nbsp; is assumed to be distortion-free.
-#Otherwise, we can redefine the signals&nbsp; $s_i(t)$&nbsp; as &nbsp;${s_i}'(t) = s_i(t) \star h(t)$,&nbsp; i.e., impose the deterministic channel distortions on the transmitted signal.<br>
+#Otherwise,&nbsp; we can redefine the signals&nbsp; $s_i(t)$&nbsp; as &nbsp; ${s_i}'(t) = s_i(t) \star h(t)$,&nbsp; i.e.,&nbsp; impose the deterministic channel distortions on the transmitted signal.<br>
 == Fundamental approach to optimal receiver design==
 <br>
-Compared to the&nbsp; [[Digital_Signal_Transmission/Structure_of_the_Optimal_Receiver#Block_diagram_and_prerequisites|"block diagram"]]&nbsp; shown on the previous page, we now perform some essential generalizations:
+Compared to the&nbsp; [[Digital_Signal_Transmission/Structure_of_the_Optimal_Receiver#Block_diagram_and_prerequisites|"block diagram"]]&nbsp; shown in the previous section, we now perform some essential generalizations:
-*The transmission channel is described by the&nbsp; [[Theory_of_Stochastic_Signals/Statistical_Dependence_and_Independence#Conditional_Probability|"conditional probability density function"]]&nbsp; $p_{\hspace{0.02cm}r(t)\hspace{0.02cm} \vert \hspace{0.02cm}s(t)}$&nbsp; which determines the dependence of the received signal&nbsp; $r(t)$&nbsp; on the transmitted signal&nbsp; $s(t)$.&nbsp; <br>
+[[File:EN_Dig_T_4_2_S2b.png|right|frame|Model for deriving the optimal receiver|class=fit]]
-*If a certain signal&nbsp; $r(t) = \rho(t)$&nbsp; has been received, the receiver has the task to determine the probability density functions based on this signal realization&nbsp; $\rho(t)$&nbsp; and the&nbsp; $M$&nbsp; conditional probability density functions
+*The transmission channel is described by the&nbsp; [[Channel_Coding/Channel_Models_and_Decision_Structures#AWGN_channel_at_Binary_Input|"conditional probability density function"]]&nbsp;  $p_{\hspace{0.02cm}r(t)\hspace{0.02cm} \vert \hspace{0.02cm}s(t)}$&nbsp; which determines the dependence of the received signal&nbsp; $r(t)$&nbsp; on the transmitted signal&nbsp; $s(t)$.&nbsp; <br>
-:$$p_{\hspace{0.05cm}r(t) \hspace{0.05cm} \vert \hspace{0.05cm} s(t) } (\rho(t) \hspace{0.05cm} \vert \hspace{0.05cm} s_i(t))\hspace{0.2cm}{\rm with}\hspace{0.2cm} i = 0, \text{...} \hspace{0.05cm}, M-1$$
-:taking into account all possible transmitted signals&nbsp; $s_i(t)$&nbsp; and their probabilities of occurrence&nbsp; ${\rm Pr}(m = m_i)$,&nbsp; find out which of the possible messages&nbsp; $m_i$&nbsp; or which of the possible signals&nbsp; $s_i(t)$&nbsp; was most likely transmitted.<br>
+*If a certain signal&nbsp; $r(t) = \rho(t)$&nbsp; has been received,&nbsp; the receiver has the task to determine the probability density functions based on this&nbsp; "signal realization" &nbsp; $\rho(t)$&nbsp; and the&nbsp; $M$&nbsp; conditional probability density functions
+:$$p_{\hspace{0.05cm}r(t) \hspace{0.05cm} \vert \hspace{0.05cm} s(t) } (\rho(t) \hspace{0.05cm} \vert \hspace{0.05cm} s_i(t))\hspace{0.2cm}{\rm with}\hspace{0.2cm} i = 0, \text{...} \hspace{0.05cm}, M-1.$$
-*Thus, the estimate of the optimal receiver is determined in general by the equation
+*It is to be found out which message&nbsp; $\hat{m}$&nbsp; was transmitted most probably,&nbsp; taking into account all possible transmitted signals&nbsp; $s_i(t)$&nbsp; and their occurrence probabilities&nbsp; ${\rm Pr}(m = m_i)$.
-:$$\hat{m} = {\rm arg} \max_i \hspace{0.1cm} p_{\hspace{0.02cm}s(t) \hspace{0.05cm} \vert \hspace{0.05cm} r(t) } (  s_i(t) \hspace{0.05cm} \vert \hspace{0.05cm} \rho(t)) = {\rm arg} \max_i \hspace{0.1cm} p_{m \hspace{0.05cm} \vert \hspace{0.05cm} r(t) } (  \hspace{0.05cm}m_i\hspace{0.05cm} \vert \hspace{0.05cm}\rho(t))\hspace{0.05cm},$$
-:where it is considered that the transmitted message&nbsp; $m = m_i$&nbsp; and the transmitted signal&nbsp; $s(t) = s_i(t)$&nbsp; can be uniquely transformed into each other.<br>
+*Thus,&nbsp; the estimate of the optimal receiver is determined in general by
+:$$\hat{m} = {\rm arg} \max_i \hspace{0.1cm} p_{\hspace{0.02cm}s(t) \hspace{0.05cm} \vert \hspace{0.05cm} r(t) } (  s_i(t) \hspace{0.05cm} \vert \hspace{0.05cm} \rho(t)) = {\rm arg} \max_i \hspace{0.1cm} p_{m \hspace{0.05cm} \vert \hspace{0.05cm} r(t) } (  \hspace{0.05cm}m_i\hspace{0.05cm} \vert \hspace{0.05cm}\rho(t))\hspace{0.05cm}.$$
 {{BlaueBox|TEXT=
-$\text{In other words:}$&nbsp; The optimal receiver considers as the most likely transmitted message&nbsp; $m_i$&nbsp; whose conditional probability density function&nbsp; $p_{\hspace{0.02cm}m \hspace{0.05cm} \vert \hspace{0.05cm} r(t) }$&nbsp; takes the largest possible value for the applied received signal&nbsp; $\rho(t)$&nbsp; and under the assumption&nbsp; $m =m_i$.&nbsp; }}<br>
+$\text{In other words:}$&nbsp; The optimal receiver considers as the most likely transmitted message&nbsp; $\hat{m} \in \{m_i\}$&nbsp; whose conditional probability density function&nbsp; $p_{\hspace{0.02cm}m \hspace{0.05cm} \vert \hspace{0.05cm} r(t) }$&nbsp; takes the largest possible value for the applied received signal&nbsp; $\rho(t)$&nbsp; and under the assumption&nbsp; $m =\hat{m}$.&nbsp; }}<br>
-Before we discuss the above decision rule in more detail, the optimal receiver should still be divided into two functional blocks according to the diagram:
+Before we discuss the above decision rule in more detail,&nbsp; the optimal receiver should still be divided into two functional blocks according to the diagram:
-[[File:EN_Dig_T_4_2_S2b.png|right|frame|Model for deriving the optimal receiver|class=fit]]
+*The &nbsp;'''detector'''&nbsp; takes various measurements on the received signal&nbsp; $r(t)$&nbsp; and summarizes them in the vector &nbsp;$\boldsymbol{r}$.&nbsp; With &nbsp;$K$&nbsp; measurements,&nbsp; $\boldsymbol{r}$&nbsp; corresponds to a point in the &nbsp;$K$&ndash;dimensional vector space.<br>
-*The &nbsp;'''detector'''&nbsp; takes various measurements on the received signal&nbsp; $r(t)$&nbsp; and summarizes them in the vector &nbsp;$\boldsymbol{r}$.&nbsp; With &nbsp;$K$&nbsp; measurements&nbsp; $\boldsymbol{r}$&nbsp; corresponds to a point in the &nbsp;$K$&ndash;dimensional vector space.<br>
-*The &nbsp;'''decision'''&nbsp; forms the estimated value depending on this vector. For a given vector&nbsp; $\boldsymbol{r} = \boldsymbol{\rho}$&nbsp; the decision rule is:
+*The &nbsp;'''decision'''&nbsp; forms the estimated value depending on this vector.&nbsp; For a given vector&nbsp; $\boldsymbol{r} = \boldsymbol{\rho}$&nbsp; the decision rule is:
 :$$\hat{m} = {\rm arg}\hspace{0.05cm} \max_i \hspace{0.1cm} P_{m\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r} } (  m_i\hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{\rho}) \hspace{0.05cm}.$$
-<br clear=all>
-In contrast to the upper decision rule, a conditional probability&nbsp; $P_{m\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r} }$&nbsp; now occurs instead of the conditional probability density function (PDF)&nbsp; $p_{m\hspace{0.05cm} \vert \hspace{0.05cm}r(t)}$.&nbsp; Please note the upper and lower case for the different meanings.
+In contrast to the upper decision rule,&nbsp; a conditional probability&nbsp; $P_{m\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r} }$&nbsp; now occurs instead of the conditional probability density function&nbsp; $\rm (PDF)$ &nbsp; $p_{m\hspace{0.05cm} \vert \hspace{0.05cm}r(t)}$.&nbsp; Please note the upper and lower case for the different meanings.
-<br clear=all>
 {{GraueBox|TEXT=
-$\text{Example 1:}$&nbsp; We now consider the function&nbsp; $y =  {\rm arg}\hspace{0.05cm} \max \ p(x)$, where&nbsp; $p(x)$&nbsp; describes the probability density function (PDF) of a continuous-valued or discrete-valued random variable&nbsp; $x$.&nbsp; In the second case (right graph), the PDF consists of a sum of Dirac functions with the probabilities as pulse weights.<br>
+$\text{Example 1:}$&nbsp; We now consider the function&nbsp; $y =  {\rm arg}\hspace{0.05cm} \max \ p(x)$,&nbsp; where&nbsp; $p(x)$&nbsp; describes the probability density function&nbsp; $\rm (PDF)$&nbsp; of a continuous-valued or discrete-valued random variable&nbsp; $x$.&nbsp; In the second case&nbsp; (right graph),&nbsp; the PDF consists of a sum of Dirac delta functions with the probabilities as pulse weights.<br>
 [[File:EN_Dig_T_4_2_S2c.png|righ|frame|Illustration of the "arg max" function|class=fit]]
-The graphic shows exemplary functions. In both cases the PDF maximum&nbsp; $(17)$&nbsp; is at&nbsp; $x = 6$:
+&rArr; &nbsp; The graphic shows exemplary functions.&nbsp; In both cases the PDF maximum&nbsp; $(17)$&nbsp; is at&nbsp; $x = 6$:
 :$$\max_i \hspace{0.1cm} p(x) = 17\hspace{0.05cm},$$
 :$$y = {\rm \hspace{0.05cm}arg} \max_i \hspace{0.1cm} p(x) = 6\hspace{0.05cm}.$$
-The (conditional) probabilities in the equation
+&rArr; &nbsp; The (conditional) probabilities in the equation
 :$$\hat{m} = {\rm arg}\hspace{0.05cm} \max_i \hspace{0.1cm} P_{\hspace{0.02cm}m\hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{ r} } (  m_i \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho})$$
-are &nbsp;'''a posteriori probabilities'''.
+are &nbsp;'''a-posteriori probabilities'''. &nbsp; [[Theory_of_Stochastic_Signals/Statistical_Dependence_and_Independence#Inference_probability|"Bayes' theorem"]]&nbsp; can be used to write for this:
-&nbsp; [[Theory_of_Stochastic_Signals/Statistical_Dependence_and_Independence#Inference_probability|"Bayes' theorem"]]&nbsp; can be used to write for this:
 :$$P_{\hspace{0.02cm}m\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r} } (  m_i \hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{\rho}) =
 \frac{ {\rm Pr}( m_i) \cdot p_{\boldsymbol{ r}\hspace{0.05cm} \vert \hspace{0.05cm}m } (\boldsymbol{\rho}\hspace{0.05cm} \vert \hspace{0.05cm}m_i )}{p_{\boldsymbol{ r} } (\boldsymbol{\rho})}
@@ Line 85: / Line 87: @@
-The denominator term is the same for all alternatives&nbsp; $m_i$&nbsp; and need not be considered for the decision. This gives the following rules:
+The denominator term &nbsp; $p_{\boldsymbol{ r} }(\boldsymbol{\rho})$ &nbsp; is the same for all alternatives&nbsp; $m_i$&nbsp; and need not be considered for the decision.
+This gives the following rules:
 {{BlaueBox|TEXT=
-$\text{Theorem:}$&nbsp; The decision rule of the optimal receiver, also known as &nbsp;'''MAP receiver'''&nbsp; (stands for ''maximum&ndash;a&ndash;posteriori''), is:
+$\text{Theorem:}$&nbsp; The decision rule of the optimal receiver,&nbsp; also known as &nbsp;'''maximum–a–posteriori  receiver'''&nbsp; $($in short:&nbsp; '''MAP receiver'''$)$&nbsp; is:
 :$$\hat{m}_{\rm MAP} = {\rm \hspace{0.05cm} arg} \max_i \hspace{0.1cm} P_{\hspace{0.02cm}m\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r} } (  m_i \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}) = {\rm \hspace{0.05cm}arg} \max_i \hspace{0.1cm} \big [ {\rm Pr}( m_i) \cdot p_{\boldsymbol{ r}\hspace{0.05cm} \vert \hspace{0.05cm} m } (\boldsymbol{\rho}\hspace{0.05cm} \vert \hspace{0.05cm} m_i )\big ]\hspace{0.05cm}.$$
-The advantage of this equation is that the conditional PDF&nbsp; $p_{\boldsymbol{ r}\hspace{0.05cm} \vert \hspace{0.05cm} m }$&nbsp; ("output under the condition input") describing the forward direction of the channel can be used. In contrast, the first equation uses the inference probabilities&nbsp; $P_{\hspace{0.05cm}m\hspace{0.05cm} \vert \hspace{0.02cm} \boldsymbol{ r} } $&nbsp;  ("input under the condition output").}}
+*The advantage of this equation is that the conditional PDF&nbsp; $p_{\boldsymbol{ r}\hspace{0.05cm} \vert \hspace{0.05cm} m }$&nbsp; $($"output under the condition input"$)$&nbsp; describing the forward direction of the channel can be used.
+*In contrast,&nbsp; the first equation uses the inference probabilities&nbsp; $P_{\hspace{0.05cm}m\hspace{0.05cm} \vert \hspace{0.02cm} \boldsymbol{ r} } $&nbsp;  $($"input under the condition output"$)$.}}
 {{BlaueBox|TEXT=
-$\text{Theorem:}$&nbsp; A &nbsp;'''maximum likelihood receiver'''&nbsp; (ML receiver in short) uses the decision rule
+$\text{Theorem:}$&nbsp; A &nbsp;'''maximum likelihood receiver'''&nbsp; $($in short:&nbsp; '''ML receiver'''$)$&nbsp; uses the following  decision rule:
 :$$\hat{m}_{\rm ML} = \hspace{-0.1cm} {\rm arg} \max_i \hspace{0.1cm}  p_{\boldsymbol{ r}\hspace{0.05cm} \vert \hspace{0.05cm}m } (\boldsymbol{\rho}\hspace{0.05cm} \vert \hspace{0.05cm}m_i )\hspace{0.05cm}.$$
-In this case, the possibly different occurrence probabilities&nbsp; ${\rm Pr}(m = m_i)$&nbsp; are not used for the decision process, for example, because they are not known to the receiver.}}<br>
+*In this case,&nbsp; the possibly different occurrence probabilities&nbsp; ${\rm Pr}(m = m_i)$&nbsp; are not used for the decision process.
+*For example,&nbsp; because they are not known to the receiver.}}<br>
 See the earlier chapter&nbsp; [[Digital_Signal_Transmission/Optimal_Receiver_Strategies|"Optimal Receiver Strategies"]]&nbsp; for other derivations for these receiver types.
 {{BlaueBox|TEXT=
-$\text{Conclusion:}$&nbsp; For equally likely messages&nbsp; $\{m_i\}$   &nbsp; &#8658; &nbsp; ${\rm Pr}(m = m_i) = 1/M$,&nbsp; the generally slightly worse ML receiver is equivalent to the MAP receiver:
+$\text{Conclusion:}$&nbsp; For equally likely messages&nbsp; $\{m_i\}$   &nbsp; &#8658; &nbsp; ${\rm Pr}(m = m_i) = 1/M$,&nbsp; the generally slightly worse&nbsp; "maximum likelihood  receiver"&nbsp; is equivalent to the&nbsp; "maximum–a–posteriori receiver":
 :$$\hat{m}_{\rm MAP} = \hat{m}_{\rm ML} =\hspace{-0.1cm} {\rm\hspace{0.05cm} arg} \max_i \hspace{0.1cm}
    p_{\boldsymbol{ r}\hspace{0.05cm} \vert \hspace{0.05cm}m } (\boldsymbol{\rho}\hspace{0.05cm} \vert \hspace{0.05cm}m_i )\hspace{0.05cm}.$$}}
@@ Line 112: / Line 119: @@
 == The irrelevance theorem==
 <br>
-Note that the receiver described in the last section is optimal only if the detector is implemented in the best possible way, i.e., if no information is lost by the transition from the continuous signal&nbsp; $r(t)$&nbsp; to the vector&nbsp; $\boldsymbol{r}$.&nbsp; <br>
+[[File:EN_Dig_T_4_2_S3a.png|right|frame|About the irrelevance theorem|class=fit]]
+Note that the receiver described in the last section is optimal only if the detector is implemented in the best possible way,&nbsp; if no information is lost by the transition from the continuous signal&nbsp; $r(t)$&nbsp; to the vector&nbsp; $\boldsymbol{r}$.&nbsp; <br>
+To clarify the question which and how many measurements have to be performed on the received signal&nbsp; $r(t)$&nbsp; to guarantee optimality,&nbsp; the&nbsp; "irrelevance theorem"&nbsp; is helpful.
-[[File:EN_Dig_T_4_2_S3a.png|center|frame|About the irrelevance theorem|class=fit]]
+*For this purpose,&nbsp; we consider the sketched receiver whose detector derives the two vectors&nbsp; $\boldsymbol{r}_1$&nbsp; and&nbsp; $\boldsymbol{r}_2$&nbsp; from the received signal&nbsp; $r(t)$&nbsp; and makes them available to the decision.
-To clarify the question which and how many measurements have to be performed on the received signal&nbsp; $r(t)$&nbsp; to guarantee optimality, the <i>irrelevance theorem</i>&nbsp; is helpful. For this purpose, we consider the sketched receiver whose detector derives the two vectors&nbsp; $\boldsymbol{r}_1$&nbsp; and&nbsp; $\boldsymbol{r}_2$&nbsp; from the received signal&nbsp; $r(t)$&nbsp; and makes them available to the decision. These quantities are related to the message&nbsp; $ m \in \{m_i\}$&nbsp; via the composite probability density&nbsp; $p_{\boldsymbol{ r}_1, \hspace{0.05cm}\boldsymbol{ r}_2\hspace{0.05cm} \vert \hspace{0.05cm}m }$.&nbsp; <br>
+*These quantities are related to the message&nbsp; $ m \in \{m_i\}$&nbsp; via the composite probability density&nbsp; $p_{\boldsymbol{ r}_1, \hspace{0.05cm}\boldsymbol{ r}_2\hspace{0.05cm} \vert \hspace{0.05cm}m }$.&nbsp; <br>
-The decision rule of the MAP receiver with adaptation to this example is:
+*The decision rule of the MAP receiver with adaptation to this example is:
 :$$\hat{m}_{\rm MAP} \hspace{-0.1cm}  =  \hspace{-0.1cm} {\rm arg} \max_i \hspace{0.1cm} \big [ {\rm Pr}( m_i) \cdot p_{\boldsymbol{ r}_1 , \hspace{0.05cm}\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm}m } \hspace{0.05cm} (\boldsymbol{\rho}_1, \hspace{0.05cm}\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm} m_i ) \big]=
@@ Line 127: / Line 137: @@
 Here it is to be noted:
-*The vectors&nbsp; $\boldsymbol{r}_1$&nbsp; and &nbsp;$\boldsymbol{r}_2$&nbsp; are random variables. Their realizations are denoted here and in the following by&nbsp; $\boldsymbol{\rho}_1$&nbsp; and &nbsp;$\boldsymbol{\rho}_2$.&nbsp; For emphasis, all vectors are shown in red in the graph.
+*The vectors&nbsp; $\boldsymbol{r}_1$&nbsp; and &nbsp;$\boldsymbol{r}_2$&nbsp; are random variables.&nbsp; Their realizations are denoted here and in the following by&nbsp; $\boldsymbol{\rho}_1$&nbsp; and &nbsp;$\boldsymbol{\rho}_2$.&nbsp; For emphasis,&nbsp; all vectors are red inscribed in the graph.
-*The requirements for the application of the "irrelevance theorem" are the same as those for a first order&nbsp; [[Theory_of_Stochastic_Signals/Markov_Chains#Considered_scenario|"Markov chain"]].&nbsp; The random variables&nbsp; $x$,&nbsp; $y$,&nbsp; $z$&nbsp; then form a first order Markov chain if the distribution of&nbsp; $z$&nbsp; is independent of &nbsp; $x$&nbsp for a given&nbsp;$y$.&nbsp; The first order Markov chain is the following:
-:$$p(x, y, z) = p(x) \cdot p(y\hspace{0.05cm} \vert \hspace{0.05cm}x) \cdot p(z\hspace{0.05cm} \vert \hspace{0.05cm}y) \hspace{0.25cm} {\rm instead \hspace{0.15cm}of} \hspace{0.25cm}p(x, y, z) = p(x) \cdot p(y\hspace{0.05cm} \vert \hspace{0.05cm}x) \cdot p(z\hspace{0.05cm} \vert \hspace{0.05cm}x, y) \hspace{0.05cm}.$$
-*In the general case, the optimal receiver must evaluate both vectors&nbsp; $\boldsymbol{r}_1$&nbsp; and&nbsp; $\boldsymbol{r}_2$,&nbsp; since both composite probability densities&nbsp; $p_{\boldsymbol{ r}_1\hspace{0.05cm} \vert \hspace{0.05cm}m }$&nbsp; and&nbsp; $p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{ r}_1, \hspace{0.05cm}m }$&nbsp; occur in the above decision rule.
+*The requirements for the application of the&nbsp; "irrelevance theorem"&nbsp; are the same as those for a first order&nbsp; [[Theory_of_Stochastic_Signals/Markov_Chains#Considered_scenario|"Markov chain"]].&nbsp; The random variables&nbsp; $x$,&nbsp; $y$,&nbsp; $z$&nbsp; then form a first order Markov chain if the distribution of&nbsp; $z$&nbsp; is independent of &nbsp; $x$&nbsp; for a given&nbsp;$y$.&nbsp; The first order Markov chain is the following:
+:$$p(x,\ y,\ z) = p(x) \cdot p(y\hspace{0.05cm} \vert \hspace{0.05cm}x) \cdot p(z\hspace{0.05cm} \vert \hspace{0.05cm}y) \hspace{0.75cm} {\rm instead \hspace{0.15cm}of} \hspace{0.75cm}p(x, y, z) = p(x) \cdot p(y\hspace{0.05cm} \vert \hspace{0.05cm}x) \cdot p(z\hspace{0.05cm} \vert \hspace{0.05cm}x, y) \hspace{0.05cm}.$$
-*In contrast, the receiver can neglect the second measurement without loss of information if&nbsp; $\boldsymbol{r}_2$&nbsp; is independent of message&nbsp; $m$&nbsp; for given&nbsp; $\boldsymbol{r}_1$:&nbsp;
+*In the general case,&nbsp; the optimal receiver must evaluate both vectors&nbsp; $\boldsymbol{r}_1$&nbsp; and&nbsp; $\boldsymbol{r}_2$,&nbsp; since both composite probability densities&nbsp; $p_{\boldsymbol{ r}_1\hspace{0.05cm} \vert \hspace{0.05cm}m }$&nbsp; and&nbsp; $p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{ r}_1, \hspace{0.05cm}m }$&nbsp; occur in the above decision rule.&nbsp; In contrast,&nbsp; the receiver can neglect the second measurement without loss of information&nbsp; if&nbsp; $\boldsymbol{r}_2$&nbsp; is independent of the message&nbsp; $m$&nbsp; for given&nbsp; $\boldsymbol{r}_1$:&nbsp;
 :$$p_{\boldsymbol{ r}_2\hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{ r}_1 , \hspace{0.05cm} m } \hspace{0.05cm} (\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1 , \hspace{0.05cm}m_i )=
 p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r}_1  } \hspace{0.05cm} (\boldsymbol{\rho}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1  )
 \hspace{0.05cm}.$$
-*In this case, the decision rule can be further simplified:
+*In this case,&nbsp; the decision rule can be further simplified:
 :$$\hat{m}_{\rm MAP} =
 {\rm arg} \max_i \hspace{0.1cm} \big [ {\rm Pr}( m_i) \cdot p_{\boldsymbol{ r}_1  \hspace{0.05cm} \vert \hspace{0.05cm}m } \hspace{0.05cm} (\boldsymbol{\rho}_1
@@ Line 153: / Line 161: @@
 \big]\hspace{0.05cm}.$$
 {{GraueBox|TEXT=
-$\text{Example 2:}$&nbsp; We consider two different system configurations with two noise terms&nbsp; $\boldsymbol{ n}_1$&nbsp; and&nbsp; $\boldsymbol{ n}_2$ each to illustrate the irrelevance theorem just presented. In the diagram all vectorial quantities are drawn in red. Moreover, the quantities&nbsp; $\boldsymbol{s}$,&nbsp; $\boldsymbol{ n}_1$&nbsp; and &nbsp;$\boldsymbol{ n}_2$&nbsp; are independent of each other.<br>
+[[File:EN_Dig_T_4_2_S3b.png|right|frame|Two examples of the irrelevance theorem|class=fit]]
+$\text{Example 2:}$&nbsp; We consider two different system configurations with two noise terms&nbsp; $\boldsymbol{ n}_1$&nbsp; and&nbsp; $\boldsymbol{ n}_2$ each to illustrate the irrelevance theorem just presented.
+*In the diagram all vectorial quantities are red inscribed.
+*Moreover, red inscribed the quantities&nbsp; $\boldsymbol{s}$,&nbsp; $\boldsymbol{ n}_1$&nbsp; and &nbsp;$\boldsymbol{ n}_2$&nbsp; are independent of each other.<br>
-[[File:EN_Dig_T_4_2_S3b.png|center|frame|Two examples of the irrelevance theorem|class=fit]]
 The analysis of these two arrangements yields the following results:
-*In both cases, the decision must consider the component&nbsp; $\boldsymbol{ r}_1= \boldsymbol{ s}_i + \boldsymbol{ n}_1$,&nbsp; since only this component provides the information about the useful signal&nbsp; $\boldsymbol{ s}_i$&nbsp; and thus about the transmitted message&nbsp; $m_i$.&nbsp; <br>
+*In both cases,&nbsp; the decision must consider the component&nbsp; $\boldsymbol{ r}_1= \boldsymbol{ s}_i + \boldsymbol{ n}_1$,&nbsp; since only this component provides the information about the possible transmitted signals&nbsp; $\boldsymbol{ s}_i$&nbsp; and thus about the message&nbsp; $m_i$.&nbsp; <br>
-*In the upper configuration, &nbsp; $\boldsymbol{ r}_2$&nbsp; contains no information about&nbsp; $m_i$ that has not already been provided by &nbsp;$\boldsymbol{ r}_1$.&nbsp; Rather, &nbsp; $\boldsymbol{ r}_2= \boldsymbol{ r}_1 + \boldsymbol{ n}_2$&nbsp; is just a noisy version of&nbsp; $\boldsymbol{ r}_1$&nbsp; and depends only on the noise&nbsp; $\boldsymbol{ n}_2$&nbsp; once&nbsp; $\boldsymbol{ r}_1$&nbsp; is known &nbsp; &#8658; &nbsp; $\boldsymbol{ r}_2$&nbsp; is irrelevant:
+*In the upper configuration, &nbsp; $\boldsymbol{ r}_2$&nbsp; contains no information about&nbsp; $m_i$ that has not already been provided by &nbsp;$\boldsymbol{ r}_1$.&nbsp; Rather, &nbsp; $\boldsymbol{ r}_2= \boldsymbol{ r}_1 + \boldsymbol{ n}_2$&nbsp; is just a noisy version of&nbsp; $\boldsymbol{ r}_1$&nbsp; and depends only on the noise&nbsp; $\boldsymbol{ n}_2$&nbsp; once&nbsp; $\boldsymbol{ r}_1$&nbsp; is known &nbsp; &#8658; &nbsp; $\boldsymbol{ r}_2$&nbsp; '''is irrelevant''':
 :$$p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r}_1 , \hspace{0.05cm} m } \hspace{0.05cm} (\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1 , \hspace{0.05cm}m_i )=
 p_{\boldsymbol{ r}_2\hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r}_1  } \hspace{0.05cm} (\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm}\boldsymbol{\rho}_1  )=
 p_{\boldsymbol{ n}_2  } \hspace{0.05cm} (\boldsymbol{\rho}_2 - \boldsymbol{\rho}_1  )\hspace{0.05cm}.$$
-*In the lower configuration, on the other hand, $\boldsymbol{ r}_2= \boldsymbol{ n}_1 + \boldsymbol{ n}_2$ is helpful to the receiver, since it provides it with an estimate of the noise term $\boldsymbol{ n}_1$ &nbsp; &#8658; &nbsp; $\boldsymbol{ r}_2$ should therefore not be discarded here. Formally, this result can be expressed as follows:
+*In the lower configuration,&nbsp; on the other hand,&nbsp; $\boldsymbol{ r}_2= \boldsymbol{ n}_1 + \boldsymbol{ n}_2$&nbsp; is helpful to the receiver,&nbsp; since it provides it with an estimate of the noise term&nbsp; $\boldsymbol{ n}_1$ &nbsp; &#8658; &nbsp; $\boldsymbol{ r}_2$ should therefore not be discarded here.
+*Formally, this result can be expressed as follows:
 :$$p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ r}_1 , \hspace{0.05cm} m } \hspace{0.05cm} (\boldsymbol{\rho}_2\hspace{0.05cm} \vert \hspace{0.05cm}  \boldsymbol{\rho}_1 , \hspace{0.05cm}m_i ) = p_{\boldsymbol{ r}_2 \hspace{0.05cm} \vert \hspace{0.05cm}  \boldsymbol{ n}_1 , \hspace{0.05cm} m } \hspace{0.05cm} (\boldsymbol{\rho}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1  - \boldsymbol{s}_i, \hspace{0.05cm}m_i)= p_{\boldsymbol{ n}_2 \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{ n}_1 , \hspace{0.05cm} m  } \hspace{0.05cm} (\boldsymbol{\rho}_2- \boldsymbol{\rho}_1  + \boldsymbol{s}_i \hspace{0.05cm} \vert \hspace{0.05cm} \boldsymbol{\rho}_1  - \boldsymbol{s}_i, \hspace{0.05cm}m_i) = p_{\boldsymbol{ n}_2  } \hspace{0.05cm} (\boldsymbol{\rho}_2- \boldsymbol{\rho}_1  + \boldsymbol{s}_i )
 \hspace{0.05cm}.$$
-*Since the message $\boldsymbol{ s}_i$ now appears in the argument of this function, $\boldsymbol{ r}_2$ is "not irrelevant" but quite relevant.}}<br>
+*Since the possible transmitted signal&nbsp; $\boldsymbol{ s}_i$&nbsp; now appears in the argument of this function,&nbsp; $\boldsymbol{ r}_2$&nbsp; '''is not irrelevant,&nbsp; but quite relevant'''.}}<br>
 == Some properties of the AWGN channel==
 <br>
-In order to make further statements about the nature of the optimal measurements of the vector&nbsp; $\boldsymbol{ r}$,&nbsp; it is necessary to further specify the (conditional) probability density function&nbsp; $p_{\hspace{0.02cm}r(t)\hspace{0.05cm} \vert \hspace{0.05cm}s(t)}$&nbsp; characterizing the channel. In the following, we will consider communication over the&nbsp; [[Modulation_Methods/Quality_Criteria#Some_remarks_on_the_AWGN_channel_model| "AWGN channel"]],&nbsp; whose most important properties are briefly summarized again here:
+In order to make further statements about the nature of the optimal measurements of the vector&nbsp; $\boldsymbol{ r}$,&nbsp; it is necessary to further specify the&nbsp; (conditional)&nbsp; probability density function&nbsp; $p_{\hspace{0.02cm}r(t)\hspace{0.05cm} \vert \hspace{0.05cm}s(t)}$&nbsp; characterizing the channel.&nbsp; In the following,&nbsp; we will consider communication over the&nbsp; [[Modulation_Methods/Quality_Criteria#Some_remarks_on_the_AWGN_channel_model| "AWGN channel"]],&nbsp; whose most important properties are briefly summarized again here:
-*The output signal of the AWGN channel is&nbsp; $r(t) = s(t)+n(t)$, where &nbsp; $s(t)$&nbsp; indicates the transmitted signal and&nbsp; $n(t)$&nbsp; is represented by a Gaussian noise process.<br>
+*The output signal of the AWGN channel is&nbsp; $r(t) = s(t)+n(t)$,&nbsp; where &nbsp; $s(t)$&nbsp; indicates the transmitted signal and&nbsp; $n(t)$&nbsp; is represented by a Gaussian noise process.<br>
-*A random process&nbsp; $\{n(t)\}$&nbsp; is said to be Gaussian if the elements of the&nbsp; $k$&ndash;dimensional random variables&nbsp; $\{n_1(t)\hspace{0.05cm} \text{...} \hspace{0.05cm}n_k(t)\}$&nbsp; are jointly Gaussian &nbsp; &rArr; &nbsp; <i>"Jointly Gaussian"</i>.<br>
+*A random process&nbsp; $\{n(t)\}$&nbsp; is said to be Gaussian if the elements of the&nbsp; $k$&ndash;dimensional random variables&nbsp; $\{n_1(t)\hspace{0.05cm} \text{...} \hspace{0.05cm}n_k(t)\}$&nbsp; are&nbsp; "jointly Gaussian".<br>
-*The average value of the AWGN noise is&nbsp; ${\rm E}\big[n(t)\big] = 0$. Moreover,&nbsp; $n(t)$&nbsp; is "white", which means that the&nbsp; [[Theory_of_Stochastic_Signals/Power-Spectral_Density|"power-spectral density"]]&nbsp; (PSD) is constant for all frequencies &nbsp;$($from &nbsp;$-\infty$ to $+\infty)$:&nbsp; &nbsp;
+*The mean value of the AWGN noise is&nbsp; ${\rm E}\big[n(t)\big] = 0$.&nbsp; Moreover,&nbsp; $n(t)$&nbsp; is&nbsp; "white",&nbsp; which means that the&nbsp; [[Theory_of_Stochastic_Signals/Power-Spectral_Density|"power-spectral density"]]&nbsp; $\rm (PSD)$&nbsp; is constant for all frequencies&nbsp; $($from &nbsp;$-\infty$ to $+\infty)$:
 :$${\it \Phi}_n(f) = {N_0}/{2}
 \hspace{0.05cm}.$$
-*According to the&nbsp; [[Theory_of_Stochastic_Signals/Power-Spectral_Density#Wiener-Khintchine_Theorem|"Wiener-Chintchine theorem"]],&nbsp; the auto-correlation function (ACF) is obtained as the&nbsp; [[Signal_Representation/Fourier_Transform_and_its_Inverse#The_second_Fourier_integral| "Fourier retransform"]]&nbsp; of&nbsp; ${\it \Phi_n(f)}$:
+*According to the&nbsp; [[Theory_of_Stochastic_Signals/Power-Spectral_Density#Wiener-Khintchine_Theorem|"Wiener-Khintchine theorem"]],&nbsp; the auto-correlation function&nbsp; $\rm (ACF)$&nbsp; is obtained as the&nbsp; [[Signal_Representation/Fourier_Transform_and_its_Inverse#The_second_Fourier_integral| "Fourier retransform"]]&nbsp; of&nbsp; ${\it \Phi_n(f)}$:
 :$${\varphi_n(\tau)} = {\rm E}\big [n(t) \cdot n(t+\tau)\big  ] = {N_0}/{2} \cdot \delta(t)\hspace{0.3cm}
 \Rightarrow \hspace{0.3cm} {\rm E}\big [n(t) \cdot n(t+\tau)\big  ]  =
@@ Line 191: / Line 204: @@
 \\  {\rm f{or}}  \hspace{0.15cm} \tau \ne 0 \hspace{0.05cm},\\ \end{array}$$
-*Here, $N_0$&nbsp; indicates the physical noise power density (defined only for &nbsp;$f \ge 0$&nbsp;). The constant PSD value&nbsp; $(N_0/2)$&nbsp; and the weight of the Dirac function in the ACF $($also &nbsp;$N_0/2)$&nbsp; result from the two-sided approach alone.<br><br>
+*Here,&nbsp; $N_0$&nbsp; indicates the physical noise power density&nbsp; $($defined only for &nbsp;$f \ge 0)$.&nbsp; The constant PSD value&nbsp; $(N_0/2)$&nbsp; and the weight of the Dirac delta function in the ACF&nbsp; $($also &nbsp;$N_0/2)$&nbsp; result from the two-sided approach alone.<br><br>
-More information on this topic is provided by the learning video&nbsp; [[Der_AWGN-Kanal_(Lernvideo)|"The AWGN channel"]]&nbsp; in part two.<br>
+&rArr; &nbsp; More information on this topic is provided by the&nbsp; (German language)&nbsp; learning video&nbsp; [[Der_AWGN-Kanal_(Lernvideo)|"The AWGN channel"]]&nbsp; in part two.<br>
 == Description of the AWGN channel by orthonormal basis functions==
 <br>
-From the penultimate statement in the last section, we see that
+From the penultimate statement in the last section,&nbsp; we see that
-*pure AWGN noise&nbsp; $n(t)$&nbsp; always has infinite variance (power): &nbsp; $\sigma_n^2 \to \infty$,<br>
+*pure AWGN noise&nbsp; $n(t)$&nbsp; always has infinite variance&nbsp; (power): &nbsp; $\sigma_n^2 \to \infty$,<br>
-*consequently, in reality only filtered noise&nbsp; $n\hspace{0.05cm}'(t) = n(t) \star h_n(t)$&nbsp; can occur.<br><br>
+*consequently,&nbsp; in reality only filtered noise&nbsp; $n\hspace{0.05cm}'(t) = n(t) \star h_n(t)$&nbsp; can occur.<br><br>
-With the impulse response&nbsp; $h_n(t)$&nbsp; and the&nbsp; frequency response $H_n(f) = {\rm F}\big [h_n(t)\big ]$,&nbsp; the following equations then hold:<br>
+With the impulse response&nbsp; $h_n(t)$&nbsp; and the&nbsp; frequency response $H_n(f) = {\rm F}\big [h_n(t)\big ]$,&nbsp; the following equations hold:<br>
 :$${\rm E}\big[n\hspace{0.05cm}'(t)  \big] \hspace{0.15cm} =  \hspace{0.2cm} {\rm E}\big[n(t)  \big] = 0 \hspace{0.05cm},$$
@@ Line 207: / Line 221: @@
 :$$ {\it \varphi_{n\hspace{0.05cm}'}(\tau)} \hspace{0.1cm}  =  \hspace{0.1cm} {N_0}/{2}\hspace{0.1cm} \cdot \big [h_{n}(\tau) \star h_{n}(-\tau)\big  ]\hspace{0.05cm},$$
 :$$\sigma_n^2  \hspace{0.1cm}  =  \hspace{0.1cm} {  \varphi_{n\hspace{0.05cm}'}(\tau = 0)} =  {N_0}/{2} \cdot
-\int_{-\infty}^{+\infty}h_n^2(t)\,{\rm d} t ={N_0}/{2}\hspace{0.1cm} \cdot < \hspace{-0.1cm}h_n(t), \hspace{0.1cm} h_n(t) \hspace{-0.05cm} > \hspace{0.1cm} =
+\int_{-\infty}^{+\infty}h_n^2(t)\,{\rm d} t ={N_0}/{2}\hspace{0.1cm} \cdot < \hspace{-0.1cm}h_n(t), \hspace{0.1cm} h_n(t) \hspace{-0.05cm} > \hspace{0.1cm} $$
+:$$\Rightarrow \hspace{0.3cm} \sigma_n^2  \hspace{0.1cm} =
   \int_{-\infty}^{+\infty}{\it \Phi}_{n\hspace{0.05cm}'}(f)\,{\rm d} f = {N_0}/{2} \cdot \int_{-\infty}^{+\infty}|H_n(f)|^2\,{\rm d} f \hspace{0.05cm}.$$
-In the following,&nbsp; $n(t)$&nbsp; always implicitly includes a ''band limitation''; thus, the notation&nbsp; $n'(t)$&nbsp; will be omitted in the future.<br>
+In the following,&nbsp; $n(t)$&nbsp; always implicitly includes a&nbsp; "band limitation";&nbsp; thus,&nbsp; the notation&nbsp; $n'(t)$&nbsp; will be omitted in the future.<br>
 {{BlaueBox|TEXT=
 $\text{Please note:}$&nbsp; Similar to the transmitted signal&nbsp; $s(t)$,&nbsp; the noise process&nbsp; $\{n(t)\}$&nbsp; can be written as a weighted sum of orthonormal basis functions&nbsp; $\varphi_j(t)$.&nbsp;
-*In contrast to&nbsp; $s(t)$,&nbsp; however, a restriction to a finite number of basis functions is not possible.
+*In contrast to&nbsp; $s(t)$,&nbsp; however,&nbsp; a restriction to a finite number of basis functions is not possible.
-*Rather, for purely stochastic quantities, the following always holds for the corresponding signal representation
+*Rather,&nbsp; for purely stochastic quantities,&nbsp; the following always holds for the corresponding signal representation
 :$$n(t) = \lim_{N \rightarrow \infty} \sum\limits_{j = 1}^{N}n_j \cdot \varphi_j(t) \hspace{0.05cm},$$
@@ Line 224: / Line 240: @@
-<i>Note:</i> &nbsp; To avoid confusion with the basis functions&nbsp; $\varphi_j(t)$,&nbsp; in the following we will always express the ACF&nbsp; $\varphi_n(\tau)$&nbsp; of the noise process only as the expected value&nbsp; ${\rm E}\big [n(t) \cdot n(t + \tau)\big ]$.&nbsp; <br>
+<u>Note:</u> &nbsp; To avoid confusion with the basis functions&nbsp; $\varphi_j(t)$,&nbsp;  we will in the following express the auto-correlation function&nbsp; $\rm (ACF)$&nbsp;&nbsp; $\varphi_n(\tau)$&nbsp; of the noise process only as the expected value&nbsp;
+:$${\rm E}\big [n(t) \cdot n(t + \tau)\big ] \equiv \varphi_n(\tau)  .$$ <br>
 == Optimal receiver for the AWGN channel==
@@ Line 230: / Line 247: @@
 [[File:EN_Dig_T_4_2_S5b.png|right|frame|Optimal receiver at the AWGN channel|class=fit]]
 The received signal&nbsp; $r(t) = s(t) + n(t)$&nbsp; can also be decomposed into basis functions in a well-known way:
-:$$r(t) =  \sum\limits_{j = 1}^{\infty}r_j \cdot \varphi_j(t) \hspace{0.05cm}.$$
+$$r(t) =  \sum\limits_{j = 1}^{\infty}r_j \cdot \varphi_j(t) \hspace{0.05cm}.$$
 To be considered:
-*The&nbsp; $M$&nbsp; possible transmitted signals&nbsp; $\{s_i(t)\}$&nbsp; span a signal space with a total of&nbsp;  $N$&nbsp; basis functions&nbsp; $\varphi_1(t)$, ... , $\varphi_N(t)$&nbsp; auf.<br><br>
+*The&nbsp; $M$&nbsp; possible transmitted signals&nbsp; $\{s_i(t)\}$&nbsp; span a signal space with a total of&nbsp;  $N$&nbsp; basis functions&nbsp; $\varphi_1(t)$, ... , $\varphi_N(t)$.<br>
+*These&nbsp; $N$&nbsp; basis functions&nbsp; $\varphi_j(t)$&nbsp; are used simultaneously to describe the noise signal&nbsp; $n(t)$&nbsp; and the received signal&nbsp; $r(t)$.&nbsp; <br>
-*These&nbsp; $N$&nbsp; basis functions&nbsp; $\varphi_j(t)$&nbsp; are used simultaneously to describe the noise signal&nbsp; $n(t)$&nbsp; and the received signal&nbsp; $r(t)$.&nbsp; <br><br>
+*For a complete characterization of&nbsp; $n(t)$&nbsp; or&nbsp; $r(t)$,&nbsp; however,&nbsp; an infinite number of further basis functions&nbsp; $\varphi_{N+1}(t)$,&nbsp; $\varphi_{N+2}(t)$,&nbsp; ... are needed.<br>
-*For a complete characterization of&nbsp; $n(t)$&nbsp; or&nbsp; $r(t)$,&nbsp; however, an infinite number of further basis functions&nbsp; $\varphi_{N+1}(t)$,&nbsp; $\varphi_{N+2}(t)$,&nbsp; ... are needed.<br><br>
-Thus, the coefficients of the received signal&nbsp; $r(t)$&nbsp; are obtained according to the following equation, taking into account that the signals&nbsp; $s_i(t)$&nbsp; and the noise&nbsp; $n(t)$&nbsp; are independent of each other:
+Thus,&nbsp; the coefficients of the received signal&nbsp; $r(t)$&nbsp; are obtained according to the following equation,&nbsp; taking into account that the signals&nbsp; $s_i(t)$&nbsp; and the noise&nbsp; $n(t)$&nbsp; are independent of each other:
 :$$r_j \hspace{0.1cm}  =  \hspace{0.1cm} \hspace{0.1cm} < \hspace{-0.1cm}r(t), \hspace{0.1cm} \varphi_j(t) \hspace{-0.05cm} > \hspace{0.1cm}=\hspace{0.1cm}
@@ Line 247: / Line 265: @@
 \\  {j > N}   \hspace{0.05cm}.\\ \end{array}$$
-Thus, the structure sketched above results for the optimal receiver.<br>
+Thus,&nbsp; the structure sketched above results for the optimal receiver.<br>
 <br>
-Let us first consider the &nbsp;'''AWGN channel'''. Here, the prefilter with the frequency response&nbsp; $W(f)$,&nbsp; which is intended for colored noise, can be dispensed with.<br>
+Let us first consider the &nbsp;'''AWGN channel'''.&nbsp; Here,&nbsp; the prefilter with the frequency response&nbsp; $W(f)$,&nbsp; which is intended for colored noise,&nbsp; can be dispensed with.<br>
-The detector of the optimal receiver forms the coefficients&nbsp; $r_j \hspace{0.1cm}  =  \hspace{0.1cm} \hspace{0.1cm} < \hspace{-0.1cm}r(t), \hspace{0.1cm} \varphi_j(t)\hspace{-0.05cm} >$&nbsp; and passes them on to the decision. If the decision is based on all &ndash; i.e., infinitely many &ndash; coefficients&nbsp; $r_j$, the probability of a wrong decision is minimal and the receiver is optimal.<br>
-The real-valued coefficients&nbsp; $r_j$&nbsp; were calculated above as follows:
+#The detector of the optimal receiver forms the coefficients&nbsp; $r_j \hspace{0.1cm}  =  \hspace{0.1cm} \hspace{0.1cm} < \hspace{-0.1cm}r(t), \hspace{0.1cm} \varphi_j(t)\hspace{-0.05cm} >$&nbsp; and passes them on to the decision.
-:$$r_j =
+#If the decision is based on all&nbsp; $($i.e., infinitely many$)$&nbsp; coefficients&nbsp; $r_j$,&nbsp; the probability of a wrong decision is minimal and the receiver is optimal.<br>
+#The real-valued coefficients&nbsp; $r_j$&nbsp; were calculated as follows:
+::$$r_j =
   \left\{ \begin{array}{c}  s_{ij} + n_j\\
     n_j \end{array} \right.\quad
@@ Line 261: / Line 279: @@
 According to the&nbsp; [[Digital_Signal_Transmission/Structure_of_the_Optimal_Receiver#The_irrelevance_theorem|"irrelevance theorem"]]&nbsp; it can be shown that for additive white Gaussian noise
-*the optimality is not reduced, if the coefficients&nbsp; $(s_{ij})$&nbsp; independent coefficients &nbsp;$r_{N+1}$,&nbsp; $r_{N+2}$,&nbsp; ... are not included in the decision process, and therefore<br>
+*the optimality is not lowered if the coefficients &nbsp;$r_{N+1}$,&nbsp; $r_{N+2}$,&nbsp; ... ,&nbsp; that do not depend on the message&nbsp; $(s_{ij})$,&nbsp; are not included in the decision process,&nbsp; and therefore<br>
-*the detector can only form the projections of the received signal&nbsp; $r(t)$&nbsp; onto the&nbsp; $N$&nbsp; basis functions&nbsp; $\varphi_{1}(t)$, ... , $\varphi_{N}(t)$&nbsp; given by the useful signal&nbsp; $s(t)$&nbsp; must be formed.<br>
+*the detector has to form only the projections of the received signal&nbsp; $r(t)$&nbsp; onto the&nbsp; $N$&nbsp; basis functions&nbsp; $\varphi_{1}(t)$, ... , $\varphi_{N}(t)$&nbsp; given by the useful signal&nbsp; $s(t)$.&nbsp;
 In the graph this significant simplification is indicated by the gray background.<br>
-In the case of &nbsp;'''colored noise''' &nbsp;&nbsp;&#8658;&nbsp;&nbsp; power-spectral density&nbsp; ${\it \Phi}_n(f) \ne {\rm const.}$&nbsp; only an additional prefilter with the amplitude response&nbsp; $|W(f)| = {1}/{\sqrt{\it \Phi}_n(f)}$&nbsp; is required. This filter is also called "<i>whitening filter"</i>, because the noise power density at the output is constant again &ndash; i.e. "white".
+In the case of &nbsp;'''colored noise''' &nbsp; &#8658; &nbsp; power-spectral density&nbsp; ${\it \Phi}_n(f) \ne {\rm const.}$,&nbsp; only an additional prefilter with the amplitude response&nbsp; $|W(f)| = {1}/{\sqrt{\it \Phi}_n(f)}$&nbsp; is required.&nbsp;
+#This filter is called&nbsp; "whitening filter",&nbsp; because the noise power-spectral density at the output is constant again &nbsp; &#8658; &nbsp; "white".
-More details can be found in the chapter&nbsp; [[Theory_of_Stochastic_Signals/Matched_Filter#Generalized_matched_filter_for_the_case_of_colored_interference|"matched filter for colored interference"]]&nbsp; of the book "Stochastic Signal Theory".<br>
+#More details can be found in the chapter&nbsp; [[Theory_of_Stochastic_Signals/Matched_Filter#Generalized_matched_filter_for_the_case_of_colored_interference|"Matched filter for colored interference"]]&nbsp; of the book&nbsp; "Stochastic Signal Theory".<br>
 == Implementation aspects ==
 <br>
-Essential components of the optimal receiver are the calculations of the inner products according to the equations&nbsp; $r_j \hspace{0.1cm}  =  \hspace{0.1cm} \hspace{0.1cm} < \hspace{-0.1cm}r(t), \hspace{0.1cm} \varphi_j(t) \hspace{-0.05cm} >$. These can be implemented in several ways:
+Essential components of the optimal receiver are the calculations of the inner products according to the equations &nbsp; $r_j \hspace{0.1cm}  =  \hspace{0.1cm} \hspace{0.1cm} < \hspace{-0.1cm}r(t), \hspace{0.1cm} \varphi_j(t) \hspace{-0.05cm} >$.&nbsp;
-*In the &nbsp;'''correlation receiver'''&nbsp; (see the&nbsp; [[Digital_Signal_Transmission/Optimal_Receiver_Strategies#Correlation_receiver_with_unipolar_signaling|"chapter of the same name"]] for more details on this implementation),&nbsp; the inner products are realized directly according to the definition with analog multipliers and integrators:
+{{BlaueBox|TEXT=
+$\text{These can be implemented in several ways:}$&nbsp;
+*In the &nbsp;'''correlation receiver'''&nbsp; $($see the&nbsp; [[Digital_Signal_Transmission/Optimal_Receiver_Strategies#Correlation_receiver_with_unipolar_signaling|"chapter of the same name"]] for more details on this implementation$)$,&nbsp; the inner products are realized directly according to the definition with analog multipliers and integrators:
 :$$r_j = \int_{-\infty}^{+\infty}r(t) \cdot \varphi_j(t) \,{\rm d} t \hspace{0.05cm}.$$
-*The &nbsp;'''matched filter receiver''', already derived in the chapter&nbsp; [[Digital_Signal_Transmission/Error_Probability_for_Baseband_Transmission#Optimal_binary_receiver_.E2.80.93_.22Matched_Filter.22_realization|"Optimal Binary Receiver"]]&nbsp; at the beginning of this book, achieves the same result using a linear filter with impulse response&nbsp;  $h_j(t) = \varphi_j(t) \cdot (T-t)$&nbsp; followed by sampling at time&nbsp; $t = T$:&nbsp;
+*The &nbsp;'''matched filter receiver''',&nbsp; already derived in the chapter&nbsp; [[Digital_Signal_Transmission/Error_Probability_for_Baseband_Transmission#Optimal_binary_receiver_.E2.80.93_.22Matched_Filter.22_realization|"Optimal Binary Receiver"]]&nbsp; at the beginning of this book,&nbsp; achieves the same result using a linear filter with impulse response&nbsp;  $h_j(t) = \varphi_j(t) \cdot (T-t)$&nbsp; followed by sampling at time&nbsp; $t = T$:&nbsp;
 :$$r_j = \int_{-\infty}^{+\infty}r(\tau) \cdot h_j(t-\tau) \,{\rm d} \tau
 = \int_{-\infty}^{+\infty}r(\tau) \cdot \varphi_j(T-t+\tau) \,{\rm d} \tau \hspace{0.3cm}
 \Rightarrow \hspace{0.3cm} r_j (t = \tau) = \int_{-\infty}^{+\infty}r(\tau) \cdot \varphi_j(\tau) \,{\rm d} \tau = r_j
 \hspace{0.05cm}.$$
+[[File:EN_Dig_T_4_2_S6.png|left|frame|Three different implementations of the inner product|class=fit]]
+<br><br><br>
+The figure shows the two possible realizations <br>of the optimal detector.}}
-The figure shows the two possible realizations of the optimal detector.
-[[File:EN_Dig_T_4_2_S6.png|center|frame|Three different implementations of the inner product|class=fit]]
 == Probability density function of the received values ==
 <br>
-Before we turn to the optimal design of the decision maker and the calculation and approximation of the error probability in the following chapter, we first perform a statistical analysis of the decision variables&nbsp; $r_j$ valid for the AWGN channel.
+Before we turn to the optimal design of the decision maker and the calculation and approximation of the error probability in the following chapter, we first perform a statistical analysis of the decision variables&nbsp; $r_j$&nbsp; valid for the AWGN channel.
-[[File:P ID2009 Dig T 4 2 S7 version1.png|right|frame|Signal space constellation and PDF of the received signal|class=fit]]
+[[File:P ID2009 Dig T 4 2 S7 version1.png|right|frame|Signal space constellation&nbsp; (left)&nbsp; and PDF of the received signal&nbsp; (right)|class=fit]]
-For this purpose, we consider again the optimal binary receiver for bipolar baseband transmission over the AWGN channel, starting from the description form valid for the fourth main chapter.
+For this purpose,&nbsp; we consider again the optimal binary receiver for bipolar baseband transmission over the AWGN channel,&nbsp; starting from the description form valid for the fourth main chapter.
 With the parameters&nbsp; $N = 1$&nbsp; and&nbsp; $M = 2$,&nbsp; the signal space constellation shown in the left graph is obtained for the transmitted signal
-*with only one basis function&nbsp; $\varphi_1(t)$, because of&nbsp; $N = 1$,<br>
+*with only one basis function&nbsp; $\varphi_1(t)$,&nbsp; because of&nbsp; $N = 1$,<br>
-*with the two signal space points&nbsp; $s_i \in \{s_0, \hspace{0.05cm}s_1\}$, because of&nbsp; $M = 2$.
+*with the two signal space points&nbsp; $s_i \in \{s_0, \hspace{0.05cm}s_1\}$,&nbsp; because of&nbsp; $M = 2$.
 <br clear=all>
-For the signal&nbsp; $r(t) = s(t) + n(t)$&nbsp; at the output of the AWGN channel, the noise-free case &nbsp; &#8658; &nbsp;  $r(t) = s(t)$&nbsp; yields exactly the same constellation. The signal space points are thus at
+For the signal&nbsp; $r(t) = s(t) + n(t)$&nbsp; at the AWGN channel output,&nbsp; the noise-free case &nbsp; &#8658; &nbsp;  $r(t) = s(t)$&nbsp; yields exactly the same constellation;&nbsp; The signal space points are at
 :$$r_0 = s_0 = \sqrt{E}\hspace{0.05cm},\hspace{0.2cm}r_1 = s_1 = -\sqrt{E}\hspace{0.05cm}.$$
-Considering the (band-limited) AWGN noise&nbsp; $n(t)$,&nbsp; Gaussian curves with variance&nbsp; $\sigma_n^2$ &nbsp;&#8658;&nbsp; standard deviation &nbsp; $\sigma_n$&nbsp; are superimposed on each of the two points &nbsp; $r_0$&nbsp; and&nbsp; $r_1$&nbsp; (see right graph). The PDF of the noise component&nbsp; $n(t)$&nbsp; is thereby:
+Considering the (band-limited) AWGN noise&nbsp; $n(t)$,&nbsp;
+*Gaussian curves with variance&nbsp; $\sigma_n^2$ &nbsp;&#8658;&nbsp; standard deviation &nbsp; $\sigma_n$&nbsp; are superimposed on each of the two points &nbsp; $r_0$&nbsp; and&nbsp; $r_1$&nbsp; $($see right sketch$)$.
+*The probability density function&nbsp; $\rm (PDF)$&nbsp; of the noise component&nbsp; $n(t)$&nbsp; is thereby:
 :$$p_n(n) = \frac{1}{\sqrt{2\pi} \cdot \sigma_n}\cdot {\rm e}^{ - {n^2}/(2 \sigma_n^2)}\hspace{0.05cm}.$$
@@ Line 310: / Line 335: @@
 Regarding the units of the quantities listed here, we note:
-*$r_0 = s_0$&nbsp; and&nbsp; $r_1 = s_1$&nbsp; as well as&nbsp; $n$&nbsp; are each scalars with the unit "root of energy".<br>
+*$r_0 = s_0$&nbsp; and&nbsp; $r_1 = s_1$&nbsp; as well as&nbsp; $n$&nbsp; are each scalars with the unit&nbsp; "root of energy".<br>
-*Thus, it is obvious that&nbsp; $\sigma_n$&nbsp; also has the unit "root of energy" and&nbsp; $\sigma_n^2$&nbsp; represents energy.<br>
+*Thus,&nbsp; it is obvious that&nbsp; $\sigma_n$&nbsp; also has the unit&nbsp; "root of energy"&nbsp; and&nbsp; $\sigma_n^2$&nbsp; represents energy.<br>
-*For the AWGN channel, the noise variance is&nbsp; $\sigma_n^2 = N_0/2$, so this is also a physical quantity with unit&nbsp; $\rm W/Hz = Ws$.<br><br>
+*For the AWGN channel,&nbsp; the noise variance is &nbsp; $\sigma_n^2 = N_0/2$, &nbsp; so this is also a physical quantity with unit&nbsp; "$\rm W/Hz \equiv Ws$".<br><br>
 The topic addressed here is illustrated by examples in&nbsp; [[Aufgaben:Aufgabe_4.06:_Optimale_Entscheidungsgrenzen|"Exercise 4.6"]].&nbsp; <br>
@@ Line 320: / Line 345: @@
 == N-dimensional Gaussian noise==
 <br>
-If an&nbsp; $N$&ndash;dimensional modulation process is present, i.e., with&nbsp; $0 \le i \le M-1$&nbsp; and &nbsp;$1 \le j \le N$:
+If an&nbsp; $N$&ndash;dimensional modulation process is present,&nbsp; i.e.,&nbsp; with&nbsp; $0 \le i \le M&ndash;1$&nbsp; and &nbsp;$1 \le j \le N$:
 :$$s_i(t) = \sum\limits_{j = 1}^{N} s_{ij} \cdot \varphi_j(t) = s_{i1} \cdot \varphi_1(t)
 + s_{i2} \cdot \varphi_2(t) + \hspace{0.05cm}\text{...}\hspace{0.05cm} + s_{iN} \cdot \varphi_N(t)\hspace{0.05cm}\hspace{0.3cm}
@@ Line 326: / Line 351: @@
 \hspace{0.05cm},$$
-then the noise vector&nbsp; $\boldsymbol{ n}$&nbsp; must also be assumed to have dimension&nbsp; $N$.&nbsp; The same is true for the receive vector&nbsp;  $\boldsymbol{ r}$:
+then the noise vector&nbsp; $\boldsymbol{ n}$&nbsp; must also be assumed to have dimension&nbsp; $N$.&nbsp; The same is true for the received vector&nbsp;  $\boldsymbol{ r}$:
 :$$\boldsymbol{ n} = \left(n_{1}, n_{2}, \hspace{0.05cm}\text{...}\hspace{0.05cm},  n_{N}\right )
-\hspace{0.01cm},\hspace{0.2cm}\boldsymbol{ r} = \left(r_{1}, r_{2}, \hspace{0.05cm}\text{...}\hspace{0.05cm},  r_{N}\right )\hspace{0.05cm}.$$
+\hspace{0.01cm},$$
+:$$\boldsymbol{ r} = \left(r_{1}, r_{2}, \hspace{0.05cm}\text{...}\hspace{0.05cm},  r_{N}\right )\hspace{0.05cm}.$$
-The probability density function (PDF) is then for the AWGN channel with the realization&nbsp; $\boldsymbol{ \eta}$&nbsp; of the noise signal
+The probability density function&nbsp; $\rm (PDF)$&nbsp; for the AWGN channel is with the realization&nbsp; $\boldsymbol{ \eta}$&nbsp; of the noise signal
 :$$p_{\boldsymbol{ n}}(\boldsymbol{ \eta}) = \frac{1}{\left( \sqrt{2\pi}  \cdot \sigma_n \right)^N }  \cdot
 {\rm exp} \left [ - \frac{|| \boldsymbol{ \eta} ||^2}{2 \sigma_n^2}\right ]\hspace{0.05cm},$$
-and for the conditional PDF in the maximum likelihood decision rule, assume:
+and for the conditional PDF in the maximum likelihood decision rule:
 :$$p_{\hspace{0.02cm}\boldsymbol{ r}\hspace{0.05cm} | \hspace{0.05cm} \boldsymbol{ s}}(\boldsymbol{ \rho} \hspace{0.05cm}|\hspace{0.05cm} \boldsymbol{ s}_i) \hspace{-0.1cm}  =  \hspace{0.1cm}
@@ Line 340: / Line 366: @@
 {\rm exp} \left [ - \frac{|| \boldsymbol{ \rho} - \boldsymbol{ s}_i  ||^2}{2 \sigma_n^2}\right ]\hspace{0.05cm}.$$
-The equation follows from the general representation of the $N$&ndash;dimensional Gaussian PDF in the section&nbsp; [[Theory_of_Stochastic_Signals/Generalization_to_N-Dimensional_Random_Variables#Correlation_matrix|"correlation matrix"]]&nbsp; of the book "Theory of Stochastic Signals" under the assumption that the components are uncorrelated (and thus statistically independent). $||\boldsymbol{ \eta}||$&nbsp; is called the ''norm'' (length) of the vector &nbsp;$\boldsymbol{ \eta}$.<br>
+The equation follows
+*from the general representation of the&nbsp;  $N$&ndash;dimensional Gaussian PDF in the section&nbsp; [[Theory_of_Stochastic_Signals/Generalization_to_N-Dimensional_Random_Variables#Correlation_matrix|"correlation matrix"]]&nbsp; of the book&nbsp; "Theory of Stochastic Signals"
+*under the assumption that the components are uncorrelated&nbsp; (and thus statistically independent).
+*$||\boldsymbol{ \eta}||$&nbsp; is called the&nbsp; "norm"&nbsp; (length)&nbsp; of the vector &nbsp;$\boldsymbol{ \eta}$.<br>
-[[File:EN_Dig_T_4_2_S8.png|right|frame|Two-dimensional Gaussian PDF]]
 {{GraueBox|TEXT=
 $\text{Example 3:}$&nbsp;
-Shown on the right is the two-dimensional Gaussian PDF&nbsp; $p_{\boldsymbol{ n} } (\boldsymbol{ \eta})$&nbsp; of the 2D random variable&nbsp; $\boldsymbol{ n} = (n_1,\hspace{0.05cm}n_2)$.  Arbitrary realizations of the random variable&nbsp; $\boldsymbol{ n}$&nbsp; are denoted by&nbsp; $\boldsymbol{ \eta} = (\eta_1,\hspace{0.05cm}\eta_2)$.&nbsp;
+Shown on the right is the two-dimensional Gaussian probability density function&nbsp; $p_{\boldsymbol{ n} } (\boldsymbol{ \eta})$&nbsp; of the two-dimensional random variable&nbsp; $\boldsymbol{ n} = (n_1,\hspace{0.05cm}n_2)$.&nbsp;  Arbitrary realizations of the random variable&nbsp; $\boldsymbol{ n}$&nbsp; are denoted by&nbsp; $\boldsymbol{ \eta} = (\eta_1,\hspace{0.05cm}\eta_2)$.&nbsp; The equation of the represented two-dimensional&nbsp; "Gaussian bell curve"&nbsp; is:
-*The equation of the represented bell curve is:
+[[File:EN_Dig_T_4_2_S8.png|right|frame|Two-dimensional Gaussian PDF]]
 :$$p_{n_1, n_2}(\eta_1, \eta_2) = \frac{1}{\left( \sqrt{2\pi}  \cdot \sigma_n \right)^2 }  \cdot
 {\rm exp} \left [ - \frac{ \eta_1^2 + \eta_2^2}{2 \sigma_n^2}\right ]\hspace{0.05cm}. $$
-*The maximum of this function is at&nbsp; $\eta_1 = \eta_2 = 0$&nbsp; and has the value &nbsp;$2\pi \cdot \sigma_n^2$.
+*The maximum of this function is at&nbsp; $\eta_1 = \eta_2 = 0$&nbsp; and has the value &nbsp;$2\pi \cdot \sigma_n^2$.&nbsp; With&nbsp; $\sigma_n^2 = N_0/2$,&nbsp; the two-dimensional PDF in vector form can also be written as follows:
-*With&nbsp; $\sigma_n^2 = N_0/2$,&nbsp; the 2D PDF in vector form can also be written as follows:
 :$$p_{\boldsymbol{ n} }(\boldsymbol{ \eta}) = \frac{1}{\pi \cdot N_0 }  \cdot
 {\rm exp} \left [ - \frac{\vert \vert \boldsymbol{ \eta} \vert \vert ^2}{N_0}\right ]\hspace{0.05cm}.$$
+*This rotationally symmetric PDF is suitable e.g. for describing/investigating a&nbsp; "two-dimensional modulation process"&nbsp; such as&nbsp; [[Digital_Signal_Transmission/Carrier_Frequency_Systems_with_Coherent_Demodulation#Quadrature_amplitude_modulation_.28M-QAM.29|"M&ndash;QAM"]],&nbsp; [[Digital_Signal_Transmission/Carrier_Frequency_Systems_with_Coherent_Demodulation#Multi-level_phase.E2.80.93shift_keying_.28M.E2.80.93PSK.29|"M&ndash;PSK"]]&nbsp; or&nbsp; [[Modulation_Methods/Non-Linear_Digital_Modulation#FSK_.E2.80.93_Frequency_Shift_Keying|"2&ndash;FSK"]].<br>
-*This rotationally symmetric PDF is suitable, for example, for describing/investigating a ''two-dimensional modulation process'' such as&nbsp; [[Digital_Signal_Transmission/Carrier_Frequency_Systems_with_Coherent_Demodulation#Quadrature_amplitude_modulation_.28M-QAM.29|"<i>M</i>&ndash;QAM"]],&nbsp; [[Digital_Signal_Transmission/Carrier_Frequency_Systems_with_Coherent_Demodulation#Multi-level_phase.E2.80.93shift_keying_.28M.E2.80.93PSK.29|"<i>M</i>&ndash;PSK"]]&nbsp; or&nbsp; [[Digital_Signal_Transmission/Carrier_Frequency_Systems_with_Coherent_Demodulation#Binary_phase_shift_keying_.28BPSK.29|"2&ndash;FSK"]].<br>
+*However,&nbsp; two-dimensional real random variables are often represented in a one-dimensional complex way,&nbsp; usually in the form&nbsp; $n(t) = n_{\rm I}(t) + {\rm j} \cdot n_{\rm Q}(t)$.&nbsp; The two components are then called the&nbsp; "in-phase component"&nbsp; $n_{\rm I}(t)$&nbsp; and the&nbsp; "quadrature component"&nbsp; $n_{\rm Q}(t)$&nbsp; of the noise.<br>
-*However, two-dimensional real random variables are often represented in a one-dimensional complex way, usually in the form&nbsp; $n(t) = n_{\rm I}(t) + {\rm j} \cdot n_{\rm Q}(t)$. The two components are then called the <i>in-phase component</i>&nbsp; $n_{\rm I}(t)$&nbsp; and the <i>quadrature component</i>&nbsp; $n_{\rm Q}(t)$&nbsp; of the noise.<br>
-*The probability density function depends only on the magnitude&nbsp; $\vert n(t) \vert$&nbsp; of the noise variable and not on angle&nbsp; ${\rm arc} \ n(t)$. This means: &nbsp; complex noise is circularly symmetric (see graph).<br>
+*The probability density function depends only on the magnitude&nbsp; $\vert n(t) \vert$&nbsp; of the noise variable and not on angle&nbsp; ${\rm arc} \ n(t)$.&nbsp; This means: &nbsp; complex noise is circularly symmetric&nbsp; $($see graph$)$.<br>
-*Circularly symmetric also means that the in-phase component&nbsp; $n_{\rm I}(t)$&nbsp; and the quadrature component&nbsp; $n_{\rm Q}(t)$&nbsp; have the same distribution and thus also the same variance (standard deviation):
+*Circularly symmetric also means that the in-phase component&nbsp; $n_{\rm I}(t)$&nbsp; and the quadrature component&nbsp; $n_{\rm Q}(t)$&nbsp; have the same distribution and thus also the same variance&nbsp; $($and standard deviation$)$:
 :$$ {\rm E} \big [ n_{\rm I}^2(t)\big  ]  = {\rm E}\big [ n_{\rm Q}^2(t) \big ] = \sigma_n^2 \hspace{0.05cm},\hspace{1cm}{\rm E}\big  [ n(t) \cdot n^*(t) \big  ]\hspace{0.1cm}  =  \hspace{0.1cm}  {\rm E}\big [ n_{\rm I}^2(t) \big ] + {\rm E}\big [ n_{\rm Q}^2(t)\big  ] = 2\sigma_n^2 \hspace{0.05cm}.$$}}
-Finally, some ''denotation variants'' for Gaussian random variables:
+Finally, some&nbsp; '''denotation variants'''&nbsp; for Gaussian random variables:
 :$$x ={\cal N}(\mu, \sigma^2) \hspace{-0.1cm}: \hspace{0.3cm}\text{real Gaussian distributed random variable, with mean}\hspace{0.1cm}\mu \text                                           { and variance}\hspace{0.15cm}\sigma^2 \hspace{0.05cm},$$