Difference between revisions of "Aufgaben:Exercise 3.15: Data Processing Theorem"

From LNTwww
 
(24 intermediate revisions by 4 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Anwendung auf die Digitalsignalübertragung
+
{{quiz-Header|Buchseite=Information_Theory/Application_to_Digital_Signal_Transmission
 
}}
 
}}
  
[[File:P_ID2818__Inf_A_3_14.png|right|]]
+
[[File:P_ID2818__Inf_A_3_14.png|right|frame|"Data Processing Theorem"]]
Wir betrachten die folgende Datenverarbeitungskette:
+
We consider the following data processing chain:
:* Binäre Eingangsdaten $X$ werden durch den Prozessor $1$ verarbeitet, der durch bedingte Wahrscheinlichkeiten $(P_Y|X)$ beschreibbar ist. Dessen Ausgangsgröße ist $Y$.
+
* Binary input data  $X$  is processed by processor   $1$  (top half in the graph)  which is describable by conditional probabilities   ⇒   $P_{Y\hspace{0.03cm}|\hspace{0.03cm}X}(\cdot)$ .  Its output variable is  $Y$.
:* Ein zweiter Prozessor mit der Zufallsgröße $Y$ am Eingang und der Zufallsgröße $Z$ am Ausgang ist durch $P_{Z|Y} $gegeben. $Z$ hängt allein von $Y$ ab (entweder deterministisch oder stochastisch) und ist unabhängig von $X$:
+
* A second processor with random variable  $Y$  at input and random variable  $Z$  at output is given by  $P_{Z\hspace{0.03cm}|\hspace{0.03cm}Y}(\cdot)$   (lower half in the graph).  $Z$  depends on  $Y$  alone  (deterministic or stochastic)  and is independent of  $X$:
$$P_{Z\hspace{0.01cm}|\hspace{0.01cm} XY\hspace{-0.03cm}}(z\hspace{0.01cm}|\hspace{0.01cm} x, y) =P_{Z\hspace{0.01cm}|\hspace{0.01cm} Y\hspace{-0.03cm}}(z\hspace{0.01cm}|\hspace{0.01cm} y) \hspace{0.05cm}.$$
+
:$$P_{Z\hspace{0.05cm}|\hspace{0.03cm} XY\hspace{-0.03cm}}(z\hspace{0.03cm}|\hspace{0.03cm} x, y) =P_{Z\hspace{0.05cm}|\hspace{0.03cm} Y\hspace{-0.03cm}}(z\hspace{0.03cm}|\hspace{0.03cm} y) \hspace{0.05cm}.$$
Hierbei wurde folgende Nomenklatur benutzt:
+
The following nomenclature was used for this description:
$$x \in X = \{0, 1\}\hspace{0.02cm},\hspace{0.3cm} y \in Y = \{0,1\}\hspace{0.02cm},\hspace{0.3cm} z \in Z = \{0, 1\}\hspace{0.02cm}.$$
+
:$$x \in X = \{0, 1\}\hspace{0.02cm},\hspace{0.3cm} y \in Y = \{0,1\}\hspace{0.02cm},\hspace{0.3cm} z \in Z = \{0, 1\}\hspace{0.02cm}.$$
Die Verbund–Wahrscheinlichkeitsfunktion (englisch: ''Joint Probability Mass Function'') lautet:
+
The joint probability mass function is:
$$P_{XYZ}(x, y, z) = P_{X}(x) \cdot P_{Y\hspace{0.01cm}|\hspace{0.01cm} X\hspace{-0.03cm}}(y\hspace{0.01cm}|\hspace{0.01cm} x)\cdot P_{Z\hspace{0.01cm}|\hspace{0.01cm} Y\hspace{-0.03cm}}(z\hspace{0.01cm}|\hspace{0.01cm} y) \hspace{0.05cm}.$$
+
:$$P_{XYZ}(x, y, z) = P_{X}(x) \cdot P_{Y\hspace{0.05cm}|\hspace{0.03cm} X\hspace{-0.03cm}}(y\hspace{0.03cm}|\hspace{0.03cm} x)\cdot P_{Z\hspace{0.05cm}|\hspace{0.03cm} Y\hspace{-0.03cm}}(z\hspace{0.03cm}|\hspace{0.03cm} y) \hspace{0.05cm}.$$
Das bedeutet auch: $X → Y → Z$ bilden eine [http://en.lntwww.de/Stochastische_Signaltheorie/Markovketten Markovkette]. Für eine solche gilt das Data Processing Theorem mit folgender Konsequenz:
+
This also means:
$$I(X;Z) \hspace{-0.15cm}  \le  \hspace{-0.15cm}I(X;Y ) \hspace{0.05cm},\\ I(X;Z) \hspace{-0.15cm}  \le  \hspace{-0.15cm} I(Y;Z ) \hspace{0.05cm}.$$
 
Das Theorem besagt somit:  
 
:* Man kann durch Manipulation (''Processing'') der Daten $Y$ keine zusätzliche Information über den Eingang $X$ gewinnen.
 
:* Datenverarbeitung (durch den Prozessor 2) dient nur dem Zweck, die Information über $X$ besser sichtbar zu machen.
 
'''Hinweis:''' Die Aufgabe gehört zu [http://en.lntwww.de/Informationstheorie/Anwendung_auf_die_Digitalsignal%C3%BCbertragung Kapitel 3.3].
 
  
 +
$X → Y → Z$  form a  [[Theory_of_Stochastic_Signals/Markovketten|Markov chain]].  For such a one, the  "Data Processing Theorem"  holds with the following consequence:
 +
:$$I(X;Z)  \le  I(X;Y ) \hspace{0.05cm}, $$
 +
:$$I(X;Z)  \le  I(Y;Z ) \hspace{0.05cm}.$$
 +
The theorem thus states:
 +
:* One cannot gain additional information about input  $X$  by manipulating   ("processing")  data  $Y$ .
 +
:* Data processing  (by processor   $1$)  serves only the purpose of making the information contained in  $X$  more visible.
  
===Fragebogen===
+
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
Hints:
 +
*The task belongs to the chapter  [[Information_Theory/Anwendung_auf_die_Digitalsignalübertragung| Application to Digital Signal Transmission]].
 +
*Reference is also made to the page  [[Information_Theory/Different_Entropy_Measures_of_Two-Dimensional_Random_Variables#Chain_rule_of_the_mutual_information|Chain rule of the mutual information]]  in the previous chapter]].
 +
 +
 
 +
 
 +
 
 +
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
{Wie lässt sich das Ergebnis $I(X; Y) = 1 H_{bin}(p)$ herleiten?
+
{How can the result &nbsp;$I(X; Y) = 1 - H_{\rm bin}(p)$&nbsp; be interpreted?
 
|type="[]"}
 
|type="[]"}
+ Über die Eigenschaften eines streng symmetrischen Kanals.
+
+ The derivation is based on the properties of a strictly symmetric channel.
- Weil $H_{bin}(p)$ eine konkave Funktion ist.
+
- It is exploited that &nbsp;$H_{\rm bin}(p)$&nbsp; is a concave function.
- Das Ergebnis gilt für jede Wahrscheinlichkeitsfunktion $P_X(X).$
+
- The result is valid for any probability mass function &nbsp;$P_X(X)$.
  
  
{Welche Transinformation ergibt sich für den Prozessor $1$ mit $p = 0.1$?
+
{Which mutual information &nbsp;$I(X; Y)$&nbsp; results for the first processor with &nbsp;$p = 0.1$?
 
|type="{}"}
 
|type="{}"}
$p = 0.1:  I(X; Y)$ = { 0.531 3% } $bit$
+
$ I(X; Y) \ = \ $ { 0.531 3% } $\ \rm bit$
  
{Welche Transinformation ergibt sich für den Prozessor 2 mit $q = 0.2$?
+
{Which mutual information &nbsp;$I(Y; Z)$&nbsp; results for the second processor with &nbsp;$q = 0.2$?
 
|type="{}"}
 
|type="{}"}
$q = 0.2:  I(Y; Z)$ = { 0.278 3% } $bit$
+
$I(Y; Z) \ = \ $ { 0.278 3% } $\ \rm bit$
  
{Welche Transinformation ergibt sich für das Gesamtsystem?  
+
{Which mutual information &nbsp;$I(X; Z)$&nbsp; results for the whole system with&nbsp; $p = 0.1$ &nbsp;and&nbsp; $q = 0.2$?  
 
|type="{}"}
 
|type="{}"}
$p = 0.1, q = 0.2:  I(X; Z)$ = { 0.173 3% } $bit$
+
$I(X; Z) \ = \ $ { 0.173 3% } $\ \rm bit$
  
{Erfüllt dieses Beispiel das Data Processing Theorem?
+
{Does this example satisfy the&nbsp; "Data Processing Theorem"?
|type="[]"}
+
|type="()"}
+ ja
+
+ Yes,
- nein
+
- No.
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
'''1.''' Beide Prozessoren beschreiben streng symmetrische Kanäle $ ⇒$ sowohl gleichmäßig dispersiv als auch gleichmäßig fokussierend. Für einen solchen Binärkanal gilt mit $Y = \{0, 1\} ⇒ |Y| = 2:$
+
'''(1)'''&nbsp; Only <u>solution proposal 1</u> is correct:
$$I(X;Y) = 1 + \sum_{y \hspace{0.05cm}\in\hspace{0.05cm} Y} \hspace{0.1cm} P_{\hspace{0.01cm}Y \mid \hspace{0.01cm} X}(y|x) \cdot {\rm log}_2 \hspace{0.1cm}P_{\hspace{0.01cm}Y \mid \hspace{0.01cm} X}(y|x) \hspace{0.05cm}.$$
+
* Both processors describe strictly symmetric channels &nbsp; &rArr; &nbsp; both uniformly dispersive and uniformly focusing.
Hierbei ist es völlig egal, ob man von $X = 0$ oder von $X = 1$ ausgeht. Für $X = 0$ erhält man mit $P_{Y|X}(Y = 1|X = 0) = p$  und   $P_{Y|X}(Y = 0|X = 0) = 1 – p:$
+
* For such a binary channel, with&nbsp; $Y = \{0, 1\} \ \ |Y| = 2$:
$$ I(X;Y) = 1 + p \cdot {\rm log}_2 \hspace{0.1cm} (p) + (1-p) \cdot {\rm log}_2 \hspace{0.1cm} (1-p) = 1 - H_{\rm bin}(p)\hspace{0.05cm},$$
+
:$$I(X;Y) = 1 + \sum_{y \hspace{0.05cm}\in\hspace{0.05cm} Y} \hspace{0.1cm} P_{Y\hspace{0.03cm}|\hspace{0.03cm} X}(y\hspace{0.03cm}|\hspace{0.03cm}x) \cdot {\rm log}_2 \hspace{0.1cm}P_{\hspace{0.01cm}Y \hspace{0.03cm}|\hspace{0.03cm} X}(y\hspace{0.03cm}|\hspace{0.03cm}x) \hspace{0.05cm}.$$
$$ {\rm mit}\hspace{1.0cm}H_{\rm bin}(p)= p \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{p}+ (1-p) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{1-p}\hspace{0.05cm}.$$
+
*Here it does not matter at all whether one starts from&nbsp; $X = 0$&nbsp; or from&nbsp; $X = 1$&nbsp;.&nbsp;
Das Ergebnis gilt allerdings nur für $P_X(X) = [0.5, 0.5] ⇒$ maximale Transinformation $⇒$ Kanalkapazität. Andernfalls ist $I(X; Y)$ kleiner. Beispielsweise gilt für $PX_(X) = [1, 0]$:   $H(X) = 0  I(X; Y) = 0.$
+
*For&nbsp; $X = 0$&nbsp; one obtains with&nbsp; $P_{Y\hspace{0.03cm}|\hspace{0.03cm}X}(Y = 1\hspace{0.03cm}|\hspace{0.03cm}X = 0) = p$  &nbsp;and&nbsp;   $P_{Y\hspace{0.03cm}|\hspace{0.03cm}X}(Y = 0\hspace{0.03cm}|\hspace{0.03cm}X = 0) = 1 – p\hspace{0.05cm}$:
 +
:$$ I(X;Y) = 1 + p \cdot {\rm log}_2 \hspace{0.1cm} (p) + (1-p) \cdot {\rm log}_2 \hspace{0.1cm} (1-p) = 1 - H_{\rm bin}(p)\hspace{0.05cm},
 +
\hspace{1.0cm}H_{\rm bin}(p)= p \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{p}+ (1-p) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{1-p}\hspace{0.05cm}.$$
 +
*However, the result holds only for&nbsp; $P_X(X) = (0.5, \ 0.5)$ &nbsp; &rArr; &nbsp; maximum mutual information &nbsp; &rArr; &nbsp; channel capacity.  
 +
*Otherwise,&nbsp; $I(X; Y)$&nbsp; is smaller.&nbsp; &nbsp; &nbsp;For example, for $P_X(X) = (1, \ 0)$: &nbsp;  $H(X) = 0$ &nbsp; &rArr; &nbsp;  $I(X; Y) = 0.$
 +
*The binary entropy function is concave, but this property was not used in the derivation &nbsp; &rArr; &nbsp; answer 2 is incorrect.
 +
 
 +
 
 +
 
 +
'''(2)'''&nbsp; For processor&nbsp; $1$&nbsp;,&nbsp; $p = 0.1\hspace{0.05cm}$ gives:
 +
:$$ I(X;Y) = 1 - H_{\rm bin}(0.1) = 1 - 0.469 \hspace{0.15cm} \underline {=0.531 \,{\rm (bit)}} \hspace{0.05cm}.$$
  
Richtig ist also nur der Lösungsvorschlag 1. Die binäre Entropiefunktion ist zwar konkav, aber diese Eigenschaft wurde bei der Herleitung nicht benutzt.
 
  
'''2.''' Für den Prozessor $1$ ergibt sich mit $p = 0.1:$
+
'''(3)'''&nbsp; Correspondingly, for the second processor with&nbsp; $q = 0.2\hspace{0.05cm}$:
$$ I(X;Y) = 1 - H_{\rm bin}(0.1) = 1 - 0.469 \hspace{0.15cm} \underline {=0.531 \,{\rm (bit)}} \hspace{0.05cm}.$$
+
:$$I(Y;Z) = 1 - H_{\rm bin}(0.2) = 1 - 0.722 \hspace{0.15cm} \underline {=0.278 \,{\rm (bit)}} \hspace{0.05cm}.$$
  
'''3.''' Entsprechend gilt für den zweiten Prozessor mit $q = 0.2$:
 
$$I(Y;Z) = 1 - H_{\rm bin}(0.2) = 1 - 0.722 \hspace{0.15cm} \underline {=0.278 \,{\rm (bit)}} \hspace{0.05cm}.$$
 
  
'''4.''' Die Wahrscheinlichkeit für $Z = 0$ unter der Bedingung $X = 0$ ergibt sich über zwei Wege zu
+
'''(4)'''&nbsp; The probability for&nbsp; $Z = 0$&nbsp; under the condition&nbsp; $X = 0$&nbsp; results in two ways to
$$P(\hspace{0.01cm}Z\hspace{-0.05cm} = \hspace{-0.05cm}0 \mid \hspace{0.01cm} X\hspace{-0.05cm} = \hspace{-0.05cm}0) = (1-p) \cdot (1-q) + p \cdot q = 1 - p - q + 2pq \stackrel{!}{=} 1 - \varepsilon \hspace{0.05cm}.$$
+
:$$P(\hspace{0.01cm}Z\hspace{-0.05cm} = 0\hspace{0.03cm} | \hspace{0.03cm} X\hspace{-0.05cm} = \hspace{-0.05cm}0) = (1-p) \cdot (1-q) + p \cdot q = 1 - p - q + 2pq \stackrel{!}{=} 1 - \varepsilon \hspace{0.05cm}.$$
Das Gesamtsystem hat dann die genau gleiche BSC–Struktur wie die Prozessoren $1$ und $2$, aber nun mit der Verfälschungswahrscheinlichkeit
+
*The overall system then has exactly the same BSC structure as processors $1$ and $2$, but now with falsification probability&nbsp; $\varepsilon = p + q - 2pq \hspace{0.05cm}.$  
$$\varepsilon = p + q - 2pq \hspace{0.05cm}.$$
+
*For&nbsp; $p = 0.1$&nbsp; and&nbsp; $q = 0.2$&nbsp;, we obtain:
Für $p = 0.1$ und $q = 0.2$ erhält man
+
:$$ \varepsilon = 0.1 + 0.2 - 2\cdot 0.1 \cdot 0.2 = 0.26 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) = 1 - H_{\rm bin}(0.26) = 1 - 0.827 \hspace{0.15cm} \underline {=0.173 \,{\rm (bit)}} \hspace{0.05cm}.$$
$$ \varepsilon = 0.1 + 0.2 - 2\cdot 0.1 \cdot 0.2 = 0.26$$
 
$$ \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) = 1 - H_{\rm bin}(0.26) = 1 - 0.827 \hspace{0.15cm} \underline {=0.173 \,{\rm (bit)}} \hspace{0.05cm}.$$
 
  
  
 +
'''(5)'''&nbsp; The answer is&nbsp; <u>YES</u>,&nbsp; because the&nbsp; "Data Processing Theorem" assumes exactly the conditions given here.
  
'''5.''' Die Antwort ist natürlich JA, da beim Data ''Processing Theorem'' genau von den hier gegebenen Voraussetzungen ausgegangen wird. Wir wollen aber zusätzlich einige numerische Ergebnisse auswerten:
+
However, we want to evaluate some additional numerical results:
:* Gilt $0 ≤ p < 0.5$ und $0 ≤ q < 0.5$, so erhält man:
+
* If&nbsp; $0 ≤ p < 0.5$ &nbsp;and&nbsp; $0 ≤ q < 0.5$&nbsp; hold, we obtain:
$$varepsilon \hspace{-0.15cm}  =\hspace{-0.15cm} p + q \cdot (1-2p) > p \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) < I(X;Y) \hspace{0.05cm},$$
+
:$$\varepsilon = p + q \cdot (1-2p) > p \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) < I(X;Y) \hspace{0.05cm},$$
$$\varepsilon \hspace{-0.15cm} = \hspace{-0.15cm} q + p \cdot (1-2q) > q \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) < I(Y;Z) \hspace{0.05cm}.$$
+
:$$\varepsilon = q + p \cdot (1-2q) > q \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) < I(Y;Z) \hspace{0.05cm}.$$
:*Für $p = 0.5$ gilt unabhängig von $q$, da $I(X; Z)$ nicht größer sein kann als $I(X; Y)$:
+
*For &nbsp;$p = 0.5$&nbsp; holds independently of &nbsp;$q$,&nbsp; since&nbsp; $I(X; Z)$&nbsp; cannot be greater than&nbsp; $I(X; Y)$:
$$\varepsilon = 0.5 + q \cdot (1-1) = 0.5 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Y) = 0 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) = 0 \hspace{0.05cm}.$$
+
:$$\varepsilon = 0.5 + q \cdot (1-1) = 0.5 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Y) = 0 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) = 0 \hspace{0.05cm}.$$
Ebenso erhält man mit $q = 0.5$ unabhängig von $p$:
+
*Similarly, with &nbsp;$q = 0.5$&nbsp; independent of &nbsp;$p$, we obtain:
$$\varepsilon = 0.5 + p \cdot (1-1) = 0.5 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(Y;Z) = 0 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) = 0 \hspace{0.05cm}.$$
+
:$$\varepsilon = 0.5 + p \cdot (1-1) = 0.5 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(Y;Z) = 0 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) = 0 \hspace{0.05cm}.$$
Auch für $p < 0.5$ und $q > 0.5$ wird das Data Processing Theorem nicht verletzt, was hier nur an einem Beispiel gezeigt werden soll. Mit $p = 0.1$ und $q = 0.8$ erhält man das gleiche Ergebnis wie in Teilaufgabe (d):
+
*Also for &nbsp;$p < 0.5$&nbsp; and &nbsp;$q > 0.5$&nbsp; the Data Processing Theorem is not violated, which will be shown here only by an example:
$$\varepsilon = 0.1 + 0.8 - 2\cdot 0.1 \cdot 0.8 = 0.74$$
+
::With &nbsp;$p = 0.1$&nbsp; and &nbsp;$q = 0.8$&nbsp;, the same result is obtained as in subtask&nbsp; '''(4)''':
$$\Rightarrow \hspace{0.3cm} I(X;Z) = 1 - H_{\rm bin}(0.74) = 1 - H_{\rm bin}(0.26) =0.173 \,{\rm (bit)} \hspace{0.05cm}.$$
+
::$$\varepsilon = 0.1 + 0.8 - 2\cdot 0.1 \cdot 0.8 = 0.74 \hspace{0.3cm}
 +
\Rightarrow \hspace{0.3cm} I(X;Z) = 1 - H_{\rm bin}(0.74) = 1 - H_{\rm bin}(0.26) =0.173 \,{\rm (bit)} \hspace{0.05cm}.$$
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  
  
  
[[Category:Aufgaben zu  Informationstheorie|^3.3 Anwendung auf die Digitalsignalübertragung^]]
+
[[Category:Information Theory: Exercises|^3.3 Application to Digital Signal Transmission^]]

Latest revision as of 14:22, 24 September 2021

"Data Processing Theorem"

We consider the following data processing chain:

  • Binary input data  $X$  is processed by processor   $1$  (top half in the graph)  which is describable by conditional probabilities   ⇒   $P_{Y\hspace{0.03cm}|\hspace{0.03cm}X}(\cdot)$ .  Its output variable is  $Y$.
  • A second processor with random variable  $Y$  at input and random variable  $Z$  at output is given by  $P_{Z\hspace{0.03cm}|\hspace{0.03cm}Y}(\cdot)$  (lower half in the graph).  $Z$  depends on  $Y$  alone  (deterministic or stochastic)  and is independent of  $X$:
$$P_{Z\hspace{0.05cm}|\hspace{0.03cm} XY\hspace{-0.03cm}}(z\hspace{0.03cm}|\hspace{0.03cm} x, y) =P_{Z\hspace{0.05cm}|\hspace{0.03cm} Y\hspace{-0.03cm}}(z\hspace{0.03cm}|\hspace{0.03cm} y) \hspace{0.05cm}.$$

The following nomenclature was used for this description:

$$x \in X = \{0, 1\}\hspace{0.02cm},\hspace{0.3cm} y \in Y = \{0,1\}\hspace{0.02cm},\hspace{0.3cm} z \in Z = \{0, 1\}\hspace{0.02cm}.$$

The joint probability mass function is:

$$P_{XYZ}(x, y, z) = P_{X}(x) \cdot P_{Y\hspace{0.05cm}|\hspace{0.03cm} X\hspace{-0.03cm}}(y\hspace{0.03cm}|\hspace{0.03cm} x)\cdot P_{Z\hspace{0.05cm}|\hspace{0.03cm} Y\hspace{-0.03cm}}(z\hspace{0.03cm}|\hspace{0.03cm} y) \hspace{0.05cm}.$$

This also means:

$X → Y → Z$  form a  Markov chain.  For such a one, the  "Data Processing Theorem"  holds with the following consequence:

$$I(X;Z) \le I(X;Y ) \hspace{0.05cm}, $$
$$I(X;Z) \le I(Y;Z ) \hspace{0.05cm}.$$

The theorem thus states:

  • One cannot gain additional information about input  $X$  by manipulating   ("processing")  data  $Y$ .
  • Data processing  (by processor   $1$)  serves only the purpose of making the information contained in  $X$  more visible.





Hints:



Questions

1

How can the result  $I(X; Y) = 1 - H_{\rm bin}(p)$  be interpreted?

The derivation is based on the properties of a strictly symmetric channel.
It is exploited that  $H_{\rm bin}(p)$  is a concave function.
The result is valid for any probability mass function  $P_X(X)$.

2

Which mutual information  $I(X; Y)$  results for the first processor with  $p = 0.1$?

$ I(X; Y) \ = \ $

$\ \rm bit$

3

Which mutual information  $I(Y; Z)$  results for the second processor with  $q = 0.2$?

$I(Y; Z) \ = \ $

$\ \rm bit$

4

Which mutual information  $I(X; Z)$  results for the whole system with  $p = 0.1$  and  $q = 0.2$?

$I(X; Z) \ = \ $

$\ \rm bit$

5

Does this example satisfy the  "Data Processing Theorem"?

Yes,
No.


Solution

(1)  Only solution proposal 1 is correct:

  • Both processors describe strictly symmetric channels   ⇒   both uniformly dispersive and uniformly focusing.
  • For such a binary channel, with  $Y = \{0, 1\} \ ⇒ \ |Y| = 2$:
$$I(X;Y) = 1 + \sum_{y \hspace{0.05cm}\in\hspace{0.05cm} Y} \hspace{0.1cm} P_{Y\hspace{0.03cm}|\hspace{0.03cm} X}(y\hspace{0.03cm}|\hspace{0.03cm}x) \cdot {\rm log}_2 \hspace{0.1cm}P_{\hspace{0.01cm}Y \hspace{0.03cm}|\hspace{0.03cm} X}(y\hspace{0.03cm}|\hspace{0.03cm}x) \hspace{0.05cm}.$$
  • Here it does not matter at all whether one starts from  $X = 0$  or from  $X = 1$ . 
  • For  $X = 0$  one obtains with  $P_{Y\hspace{0.03cm}|\hspace{0.03cm}X}(Y = 1\hspace{0.03cm}|\hspace{0.03cm}X = 0) = p$  and  $P_{Y\hspace{0.03cm}|\hspace{0.03cm}X}(Y = 0\hspace{0.03cm}|\hspace{0.03cm}X = 0) = 1 – p\hspace{0.05cm}$:
$$ I(X;Y) = 1 + p \cdot {\rm log}_2 \hspace{0.1cm} (p) + (1-p) \cdot {\rm log}_2 \hspace{0.1cm} (1-p) = 1 - H_{\rm bin}(p)\hspace{0.05cm}, \hspace{1.0cm}H_{\rm bin}(p)= p \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{p}+ (1-p) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{1-p}\hspace{0.05cm}.$$
  • However, the result holds only for  $P_X(X) = (0.5, \ 0.5)$   ⇒   maximum mutual information   ⇒   channel capacity.
  • Otherwise,  $I(X; Y)$  is smaller.     For example, for $P_X(X) = (1, \ 0)$:   $H(X) = 0$   ⇒   $I(X; Y) = 0.$
  • The binary entropy function is concave, but this property was not used in the derivation   ⇒   answer 2 is incorrect.


(2)  For processor  $1$ ,  $p = 0.1\hspace{0.05cm}$ gives:

$$ I(X;Y) = 1 - H_{\rm bin}(0.1) = 1 - 0.469 \hspace{0.15cm} \underline {=0.531 \,{\rm (bit)}} \hspace{0.05cm}.$$


(3)  Correspondingly, for the second processor with  $q = 0.2\hspace{0.05cm}$:

$$I(Y;Z) = 1 - H_{\rm bin}(0.2) = 1 - 0.722 \hspace{0.15cm} \underline {=0.278 \,{\rm (bit)}} \hspace{0.05cm}.$$


(4)  The probability for  $Z = 0$  under the condition  $X = 0$  results in two ways to

$$P(\hspace{0.01cm}Z\hspace{-0.05cm} = 0\hspace{0.03cm} | \hspace{0.03cm} X\hspace{-0.05cm} = \hspace{-0.05cm}0) = (1-p) \cdot (1-q) + p \cdot q = 1 - p - q + 2pq \stackrel{!}{=} 1 - \varepsilon \hspace{0.05cm}.$$
  • The overall system then has exactly the same BSC structure as processors $1$ and $2$, but now with falsification probability  $\varepsilon = p + q - 2pq \hspace{0.05cm}.$
  • For  $p = 0.1$  and  $q = 0.2$ , we obtain:
$$ \varepsilon = 0.1 + 0.2 - 2\cdot 0.1 \cdot 0.2 = 0.26 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) = 1 - H_{\rm bin}(0.26) = 1 - 0.827 \hspace{0.15cm} \underline {=0.173 \,{\rm (bit)}} \hspace{0.05cm}.$$


(5)  The answer is  YES,  because the  "Data Processing Theorem" assumes exactly the conditions given here.

However, we want to evaluate some additional numerical results:

  • If  $0 ≤ p < 0.5$  and  $0 ≤ q < 0.5$  hold, we obtain:
$$\varepsilon = p + q \cdot (1-2p) > p \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) < I(X;Y) \hspace{0.05cm},$$
$$\varepsilon = q + p \cdot (1-2q) > q \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) < I(Y;Z) \hspace{0.05cm}.$$
  • For  $p = 0.5$  holds independently of  $q$,  since  $I(X; Z)$  cannot be greater than  $I(X; Y)$:
$$\varepsilon = 0.5 + q \cdot (1-1) = 0.5 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Y) = 0 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) = 0 \hspace{0.05cm}.$$
  • Similarly, with  $q = 0.5$  independent of  $p$, we obtain:
$$\varepsilon = 0.5 + p \cdot (1-1) = 0.5 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(Y;Z) = 0 \hspace{0.3cm}\Rightarrow \hspace{0.3cm} I(X;Z) = 0 \hspace{0.05cm}.$$
  • Also for  $p < 0.5$  and  $q > 0.5$  the Data Processing Theorem is not violated, which will be shown here only by an example:
With  $p = 0.1$  and  $q = 0.8$ , the same result is obtained as in subtask  (4):
$$\varepsilon = 0.1 + 0.8 - 2\cdot 0.1 \cdot 0.8 = 0.74 \hspace{0.3cm} \Rightarrow \hspace{0.3cm} I(X;Z) = 1 - H_{\rm bin}(0.74) = 1 - H_{\rm bin}(0.26) =0.173 \,{\rm (bit)} \hspace{0.05cm}.$$