Difference between revisions of "Aufgaben:Exercise 3.9: Conditional Mutual Information"

From LNTwww
m (Text replacement - "Category:Aufgaben zu Informationstheorie" to "Category:Information Theory: Exercises")
 
(8 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Verschiedene Entropien zweidimensionaler Zufallsgrößen
+
{{quiz-Header|Buchseite=Information_Theory/Different_Entropy_Measures_of_Two-Dimensional_Random_Variables
 
}}
 
}}
  
[[File:P_ID2813__Inf_A_3_8.png|right|frame|Ergebnis&nbsp; $W$&nbsp; als Funktion <br>von&nbsp;  $X$,&nbsp; $Y$,&nbsp; $Z$]]
+
[[File:P_ID2813__Inf_A_3_8.png|right|frame|Result&nbsp; $W$&nbsp; as a function <br>of&nbsp;  $X$,&nbsp; $Y$,&nbsp; $Z$]]
Wir gehen von den statistisch unabhängigen Zufallsgrößen&nbsp; $X$,&nbsp; $Y$&nbsp; und&nbsp; $Z$&nbsp; mit den folgenden Eigenschaften aus:  
+
We assume statistically independent random variables&nbsp; $X$,&nbsp; $Y$&nbsp; and&nbsp; $Z$&nbsp; with the following properties:
 
:$$X \in \{1,\ 2 \} \hspace{0.05cm},\hspace{0.35cm}
 
:$$X \in \{1,\ 2 \} \hspace{0.05cm},\hspace{0.35cm}
 
Y \in \{1,\ 2 \} \hspace{0.05cm},\hspace{0.35cm}
 
Y \in \{1,\ 2 \} \hspace{0.05cm},\hspace{0.35cm}
 
Z \in \{1,\ 2 \} \hspace{0.05cm},\hspace{0.35cm} P_X(X) = P_Y(Y) = \big [ 1/2, \ 1/2 \big ]\hspace{0.05cm},\hspace{0.35cm}P_Z(Z) = \big [ p, \ 1-p \big ].$$
 
Z \in \{1,\ 2 \} \hspace{0.05cm},\hspace{0.35cm} P_X(X) = P_Y(Y) = \big [ 1/2, \ 1/2 \big ]\hspace{0.05cm},\hspace{0.35cm}P_Z(Z) = \big [ p, \ 1-p \big ].$$
  
Aus&nbsp; $X$,&nbsp; $Y$&nbsp; und&nbsp; $Z$&nbsp; bilden wir die neue Zufallsgröße&nbsp; $W = (X+Y) \cdot Z$.
+
From&nbsp; $X$,&nbsp; $Y$&nbsp; and&nbsp; $Z$&nbsp; we form the new random variable&nbsp; $W = (X+Y) \cdot Z$.
*Es ist offensichtlich, dass es zwischen&nbsp; $X$&nbsp; und&nbsp; $W$&nbsp; statistische Abhängigkeiten gibt &nbsp; &rArr; &nbsp; Transinformation&nbsp; $I(X; W) ≠ 0$.
+
* It is obvious that there are statistical dependencies between&nbsp; $X$&nbsp; and&nbsp; $W$&nbsp; &nbsp; &rArr; &nbsp; mutual information&nbsp; $I(X; W) ≠ 0$.
*Außerdem wird auch&nbsp; $I(Y; W) ≠ 0$ &nbsp;sowie&nbsp; $I(Z; W) ≠ 0$&nbsp; gelten, worauf in dieser Aufgabe jedoch nicht näher eingegangen wird.
+
*Furthermore,&nbsp; $I(Y; W) ≠ 0$ &nbsp;as well as&nbsp; $I(Z; W) ≠ 0$&nbsp; will also apply, but this will not be discussed in detail in this exercise.
  
  
In dieser Aufgabe werden drei verschiedene Transinformationsdefinitionen verwendet:
+
Three different definitions of mutual information are used in this exercise:
*die ''herkömmliche''&nbsp; Transinformation zwischen&nbsp; $X$&nbsp; und&nbsp; $W$:
+
*the&nbsp; <u>conventional</u>&nbsp; mutual information zwischen&nbsp; $X$&nbsp; and&nbsp; $W$:
 
:$$I(X;W) =  H(X) - H(X|\hspace{0.05cm}W) \hspace{0.05cm},$$   
 
:$$I(X;W) =  H(X) - H(X|\hspace{0.05cm}W) \hspace{0.05cm},$$   
* die ''bedingte''&nbsp; Transinformation zwischen&nbsp; $X$&nbsp; und&nbsp; $W$&nbsp; bei ''gegebenem Festwert''&nbsp; $Z = z$:
+
* the&nbsp; <u>conditional</u>&nbsp; mutual information between&nbsp; $X$&nbsp; and&nbsp; $W$&nbsp; with a&nbsp; <u>given fixed value</u>&nbsp; $Z = z$:
 
:$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = z) =  H(X\hspace{0.05cm}|\hspace{0.05cm} Z = z) - H(X|\hspace{0.05cm}W ,\hspace{0.05cm} Z = z) \hspace{0.05cm},$$
 
:$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = z) =  H(X\hspace{0.05cm}|\hspace{0.05cm} Z = z) - H(X|\hspace{0.05cm}W ,\hspace{0.05cm} Z = z) \hspace{0.05cm},$$
* die ''bedingte''&nbsp; Transinformation zwischen&nbsp; $X$&nbsp; und&nbsp; $W$&nbsp; bei ''gegebener Zufallsgröße''&nbsp; $Z$:
+
* the&nbsp; <u>conditional</u>&nbsp; mutual information between&nbsp; $X$&nbsp; and&nbsp; $W$&nbsp; for a&nbsp; <u>given random variable</u>&nbsp; $Z$:
 
:$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z ) =  H(X\hspace{0.05cm}|\hspace{0.05cm} Z ) - H(X|\hspace{0.05cm}W \hspace{0.05cm} Z ) \hspace{0.05cm}.$$
 
:$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z ) =  H(X\hspace{0.05cm}|\hspace{0.05cm} Z ) - H(X|\hspace{0.05cm}W \hspace{0.05cm} Z ) \hspace{0.05cm}.$$
  
Der Zusammenhang zwischen den beiden letzten Definitionen lautet:
+
The relationship between the last two definitions is:
 
:$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z ) = \sum_{z \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} (P_{Z})} \hspace{-0.2cm}
 
:$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z ) = \sum_{z \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} (P_{Z})} \hspace{-0.2cm}
 
  P_Z(z) \cdot  I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = z)\hspace{0.05cm}.$$
 
  P_Z(z) \cdot  I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = z)\hspace{0.05cm}.$$
Line 32: Line 32:
  
  
''Hinweise:''
+
Hints:
*Die Aufgabe gehört zum  Kapitel&nbsp; [[Information_Theory/Verschiedene_Entropien_zweidimensionaler_Zufallsgrößen|Verschiedene Entropien zweidimensionaler Zufallsgrößen]].
+
*The exercise belongs to the chapter&nbsp; [[Information_Theory/Verschiedene_Entropien_zweidimensionaler_Zufallsgrößen|Different entropies of two-dimensional random variables]].
*Insbesondere wird auf die Seite&nbsp; [[Information_Theory/Verschiedene_Entropien_zweidimensionaler_Zufallsgrößen#Bedingte_Transinformation|Bedingte Transinformation]] Bezug genommen .
+
*In particular, reference is made to the page&nbsp; [[Information_Theory/Verschiedene_Entropien_zweidimensionaler_Zufallsgrößen#Conditional_mutual_information|Conditional mutual information]].  
 
  
  
 
+
===Questions===
===Fragebogen===
 
 
<quiz display=simple>
 
<quiz display=simple>
  
{Wie groß ist die Transinformation zwischen&nbsp; $X$&nbsp; und&nbsp; $W$,&nbsp; falls stets&nbsp; $Z = 1$&nbsp; gilt?
+
{How large is the mutual information between&nbsp; $X$&nbsp; and&nbsp; $W$,&nbsp; if&nbsp; $Z = 1$&nbsp; always holds?
 
|type="{}"}
 
|type="{}"}
 
$ I(X; W | Z = 1) \ = \ $ { 0.5 3% } $\ \rm bit$
 
$ I(X; W | Z = 1) \ = \ $ { 0.5 3% } $\ \rm bit$
  
{Wie groß ist die Transinformation zwischen&nbsp; $X$&nbsp; und&nbsp; $W$,&nbsp; falls stets&nbsp; $Z = 2$&nbsp; gilt?
+
{How large is the mutual information between&nbsp; $X$&nbsp; and&nbsp; $W$,&nbsp; if&nbsp; $Z = 2$&nbsp; always holds?
 
|type="{}"}
 
|type="{}"}
 
$ I(X; W | Z = 2) \ = \ $ { 0.5 3% } $\ \rm bit$
 
$ I(X; W | Z = 2) \ = \ $ { 0.5 3% } $\ \rm bit$
  
{Nun gelte &nbsp;$p = {\rm Pr}(Z = 1)$.&nbsp; Wie groß ist die bedingte Transinformation zwischen&nbsp; $X$&nbsp; und&nbsp; $W$, falls&nbsp; $z  \in Z = \{1,\ 2\}$&nbsp; bekannt ist?  
+
{Now let &nbsp;$p = {\rm Pr}(Z = 1)$.&nbsp; How large is the conditional mutual information between&nbsp; $X$&nbsp; and&nbsp; $W$, if&nbsp; $z  \in Z = \{1,\ 2\}$&nbsp; is known?  
 
|type="{}"}
 
|type="{}"}
 
$p = 1/2\text{:} \ \ \ I(X; W | Z) \ = \ $  { 0.5 3% } $\ \rm bit$
 
$p = 1/2\text{:} \ \ \ I(X; W | Z) \ = \ $  { 0.5 3% } $\ \rm bit$
 
$p = 3/4\text{:} \ \ \ I(X; W | Z) \ = \ $  { 0.5 3% } $\ \rm bit$
 
$p = 3/4\text{:} \ \ \ I(X; W | Z) \ = \ $  { 0.5 3% } $\ \rm bit$
  
{Wie groß ist die unkonditionierte Transinformation für&nbsp; $p = 1/2$?  
+
{How large is the unconditional mutual information for&nbsp; $p = 1/2$?  
 
|type="{}"}
 
|type="{}"}
 
$I(X; W) \ = \ $ { 0.25 3% } $\ \rm bit$
 
$I(X; W) \ = \ $ { 0.25 3% } $\ \rm bit$
Line 66: Line 64:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
[[File:P_ID2814__Inf_A_3_8a.png|right|frame|2D-Wahrscheinlichkeitsfunktionen für&nbsp; $Z = 1$]]
+
[[File:P_ID2814__Inf_A_3_8a.png|right|frame|Two-dimensional probability mass functions for&nbsp; $Z = 1$]]
'''(1)'''&nbsp; Die obere Grafik gilt für&nbsp; $Z = 1$ &nbsp; &rArr; &nbsp; $W = X + Y$.&nbsp;  
+
'''(1)'''&nbsp; The upper graph is valid for&nbsp; $Z = 1$ &nbsp; &rArr; &nbsp; $W = X + Y$.&nbsp;  
*Unter den Voraussetzungen&nbsp; $P_X(X) = \big [1/2, \ 1/2 \big]$&nbsp; sowie&nbsp; $P_Y(Y) = \big [1/2, \ 1/2 \big]$&nbsp; ergeben sich somit die Verbundwahrscheinlichkeiten&nbsp; $P_{ XW|Z=1 }(X, W)$&nbsp; entsprechend der rechten Grafik (graue Hinterlegung).
+
*Under the conditions&nbsp; $P_X(X) = \big [1/2, \ 1/2 \big]$&nbsp; as well as&nbsp; $P_Y(Y) = \big [1/2, \ 1/2 \big]$&nbsp; the joint probabilities&nbsp; $P_{ XW|Z=1 }(X, W)$&nbsp; thus result according to the right graph (grey background).
  
*Damit gilt für die Transinformation unter der festen Bedingung&nbsp; $Z = 1$:
+
*Thus the following applies to the mutual information under the fixed condition&nbsp; $Z = 1$:
 
:$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 1) \hspace{-0.05cm} = \hspace{-1.1cm}\sum_{(x,w) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} (P_{XW}\hspace{0.01cm}|\hspace{0.01cm} Z\hspace{-0.03cm} =\hspace{-0.03cm} 1)} \hspace{-1.1cm}
 
:$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 1) \hspace{-0.05cm} = \hspace{-1.1cm}\sum_{(x,w) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} (P_{XW}\hspace{0.01cm}|\hspace{0.01cm} Z\hspace{-0.03cm} =\hspace{-0.03cm} 1)} \hspace{-1.1cm}
 
  P_{XW\hspace{0.01cm}|\hspace{0.01cm} Z\hspace{-0.03cm} =\hspace{-0.03cm} 1} (x,w) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_{XW\hspace{0.01cm}|\hspace{0.01cm} Z\hspace{-0.03cm} =\hspace{-0.03cm} 1} (x,w) }{P_X(x) \cdot P_{W\hspace{0.01cm}|\hspace{0.01cm} Z\hspace{-0.03cm} =\hspace{-0.03cm} 1} (w) }$$
 
  P_{XW\hspace{0.01cm}|\hspace{0.01cm} Z\hspace{-0.03cm} =\hspace{-0.03cm} 1} (x,w) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_{XW\hspace{0.01cm}|\hspace{0.01cm} Z\hspace{-0.03cm} =\hspace{-0.03cm} 1} (x,w) }{P_X(x) \cdot P_{W\hspace{0.01cm}|\hspace{0.01cm} Z\hspace{-0.03cm} =\hspace{-0.03cm} 1} (w) }$$
Line 82: Line 80:
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
*Der erste Term fasst die beiden horizontal schraffierten Felder in der Grafik zusammen, der zweite Term die vertikal schraffierten Felder.  
+
*The first term summarises the two horizontally shaded fields in the graph, the second term the vertically shaded fields.
*Letztere liefern wegen&nbsp; $\log_2 (1) = 0$&nbsp; keinen Beitrag.
+
*The second term do not contribute because of&nbsp; $\log_2 (1) = 0$&nbsp;.
  
  
  
[[File:P_ID2815__Inf_A_3_8b.png|right|frame|2D-Wahrscheinlichkeitsfunktionen für&nbsp; $Z = 2$]]
+
[[File:P_ID2815__Inf_A_3_8b.png|right|frame|Two-dimensional probability mass functions for&nbsp; $Z = 2$]]
'''(2)'''&nbsp; Für&nbsp; $Z = 2$&nbsp; gilt zwar $W = \{4,\ 6,\ 8\}$, es ändert sich aber hinsichtlich der Wahrscheinlichkeitsfunktionen  gegenüber der Teilaufgabe&nbsp; '''(1)'''&nbsp; nichts.  
+
'''(2)'''&nbsp; For&nbsp; $Z = 2$,&nbsp; $W = \{4,\ 6,\ 8\}$&nbsp; is valid, but nothing changes with respect to the probability functions compared to subtask&nbsp; '''(1)'''.
  
*Demzufolge erhält man auch die gleiche bedingte Transinformation:
+
*Consequently, the same conditional mutual information is obtained:
 
:$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 2) = I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 1)
 
:$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 2) = I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 1)
 
\hspace{0.15cm} \underline {=0.5\,{\rm (bit)}}
 
\hspace{0.15cm} \underline {=0.5\,{\rm (bit)}}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
<br clear=all>
+
 
'''(3)'''&nbsp; Die Gleichung lautet für&nbsp; $Z = \{1,\ 2\}$&nbsp; mit&nbsp; ${\rm Pr}(Z = 1) =p$ &nbsp;und&nbsp;  ${\rm Pr}(Z = 2) =1-p$:
+
 
 +
'''(3)'''&nbsp; The equation is for&nbsp; $Z = \{1,\ 2\}$&nbsp; with&nbsp; ${\rm Pr}(Z = 1) =p$ &nbsp;and&nbsp;  ${\rm Pr}(Z = 2) =1-p$:
 
:$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z) =  p \cdot I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 1) + (1-p) \cdot I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 2)\hspace{0.15cm} \underline {=0.5\,{\rm (bit)}}
 
:$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z) =  p \cdot I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 1) + (1-p) \cdot I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 2)\hspace{0.15cm} \underline {=0.5\,{\rm (bit)}}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
*Es ist berücksichtigt, dass nach den Teilaufgaben&nbsp; '''(1)'''&nbsp; und&nbsp; '''(2)'''&nbsp; die bedingten Transinformationen für gegebenes&nbsp; $Z = 1$&nbsp; und gegebenes&nbsp; $Z = 2$&nbsp; gleich sind.  
+
*It is considered that according to subtasks&nbsp; '''(1)'''&nbsp; and&nbsp; '''(2)'''&nbsp; the conditional mutual information for given&nbsp; $Z = 1$&nbsp; and given&nbsp; $Z = 2$&nbsp; are equal.
*Damit ist&nbsp; $I(X; W|Z)$, also unter der Bedingung einer stochastischen Zufallsgröße&nbsp; $Z = \{1,\ 2\}$&nbsp; mit&nbsp; $P_Z(Z) = \big [p, \ 1 – p\big ]$&nbsp; unabhängig von &nbsp;$p$.  
+
*Thus&nbsp; $I(X; W|Z)$, i.e. under the condition of a stochastic random variable&nbsp; $Z = \{1,\ 2\}$&nbsp; with&nbsp; $P_Z(Z) = \big [p, \ 1 – p\big ]$&nbsp; is independent of &nbsp;$p$.  
*Das Ergebnis gilt insbesondere auch für&nbsp; $\underline{p = 1/2}$&nbsp; und&nbsp; $\underline{p = 3/4}$.
+
*In particular, the result is also valid for&nbsp; $\underline{p = 1/2}$&nbsp; and&nbsp; $\underline{p = 3/4}$.
  
  
[[File:P_ID2816__Inf_A_3_8d.png|right|frame|Zur Berechnung der Verbundwahrscheinlichkeit für $XW$]]
+
[[File:P_ID2816__Inf_A_3_8d.png|right|frame|To calculate the joint probability for $XW$]]
'''(4)'''&nbsp; Die Verbundwahrscheinlichkeit&nbsp; $P_{ XW }$&nbsp; hängt von den&nbsp; $Z$–Wahrscheinlichkeiten &nbsp;$p$&nbsp; und&nbsp; $1 – p$&nbsp; ab.  
+
'''(4)'''&nbsp; The joint probability&nbsp; $P_{ XW }$&nbsp; depends on the&nbsp; $Z$–probabilites &nbsp;$p$&nbsp; and&nbsp; $1 – p$&nbsp;.
*Für&nbsp; $Pr(Z = 1) = Pr(Z = 2) = 1/2$&nbsp; ergibt sich das rechts skizzierte Schema.  
+
*For&nbsp; $Pr(Z = 1) = Pr(Z = 2) = 1/2$&nbsp; the scheme sketched on the right results.
*Zur Transinformation tragen nur wieder die beiden horizontal schraffierten Felder bei:
+
*Again, only the two horizontally shaded fields contribute to the mutual information:
 
:$$ I(X;W) = 2 \cdot \frac{1}{8} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1/8}{1/2 \cdot 1/8}
 
:$$ I(X;W) = 2 \cdot \frac{1}{8} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1/8}{1/2 \cdot 1/8}
 
\hspace{0.15cm} \underline {=0.25\,{\rm (bit)}} \hspace{0.35cm} < \hspace{0.35cm} I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z)
 
\hspace{0.15cm} \underline {=0.25\,{\rm (bit)}} \hspace{0.35cm} < \hspace{0.35cm} I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z)
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
Das Ergebnis&nbsp; $I(X; W|Z) > I(X; W)$&nbsp; trifft für dieses Beispiel, aber auch für viele andere Anwendungen zu:  
+
The result&nbsp; $I(X; W|Z) > I(X; W)$&nbsp; is true for this example, but also for many other applications:
*Kenne ich&nbsp; $Z$, so weiß ich mehr über die 2D–Zufallsgröße&nbsp; $XW$&nbsp; als ohne diese Kenntnis.  
+
*If I know&nbsp; $Z$, I know more about the 2D random variable&nbsp; $XW$&nbsp; than without this knowledge..  
*Man darf dieses Ergebnis aber nicht verallgemeinern:  
+
*However, one must not generalize this result:
:Manchmal gilt tatsächlich&nbsp; $I(X; W) > I(X; W|Z)$, so wie im&nbsp; [[Information_Theory/Verschiedene_Entropien_zweidimensionaler_Zufallsgr%C3%B6%C3%9Fen#Bedingte_Transinformation|Beispiel 3]] im Theorieteil.
+
:Sometimes&nbsp; $I(X; W) > I(X; W|Z)$, actually applies, as in&nbsp; [[Information_Theory/Verschiedene_Entropien_zweidimensionaler_Zufallsgr%C3%B6%C3%9Fen#Conditional_mutual_information|Example 4]]&nbsp; in the theory section.
 
 
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  
  
  
[[Category:Information Theory: Exercises|^3.2 Entropien von 2D-Zufallsgrößen^]]
+
[[Category:Information Theory: Exercises|^3.2 Entropies of 2D Random Variables^]]

Latest revision as of 10:16, 24 September 2021

Result  $W$  as a function
of  $X$,  $Y$,  $Z$

We assume statistically independent random variables  $X$,  $Y$  and  $Z$  with the following properties:

$$X \in \{1,\ 2 \} \hspace{0.05cm},\hspace{0.35cm} Y \in \{1,\ 2 \} \hspace{0.05cm},\hspace{0.35cm} Z \in \{1,\ 2 \} \hspace{0.05cm},\hspace{0.35cm} P_X(X) = P_Y(Y) = \big [ 1/2, \ 1/2 \big ]\hspace{0.05cm},\hspace{0.35cm}P_Z(Z) = \big [ p, \ 1-p \big ].$$

From  $X$,  $Y$  and  $Z$  we form the new random variable  $W = (X+Y) \cdot Z$.

  • It is obvious that there are statistical dependencies between  $X$  and  $W$    ⇒   mutual information  $I(X; W) ≠ 0$.
  • Furthermore,  $I(Y; W) ≠ 0$  as well as  $I(Z; W) ≠ 0$  will also apply, but this will not be discussed in detail in this exercise.


Three different definitions of mutual information are used in this exercise:

  • the  conventional  mutual information zwischen  $X$  and  $W$:
$$I(X;W) = H(X) - H(X|\hspace{0.05cm}W) \hspace{0.05cm},$$
  • the  conditional  mutual information between  $X$  and  $W$  with a  given fixed value  $Z = z$:
$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = z) = H(X\hspace{0.05cm}|\hspace{0.05cm} Z = z) - H(X|\hspace{0.05cm}W ,\hspace{0.05cm} Z = z) \hspace{0.05cm},$$
  • the  conditional  mutual information between  $X$  and  $W$  for a  given random variable  $Z$:
$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z ) = H(X\hspace{0.05cm}|\hspace{0.05cm} Z ) - H(X|\hspace{0.05cm}W \hspace{0.05cm} Z ) \hspace{0.05cm}.$$

The relationship between the last two definitions is:

$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z ) = \sum_{z \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} (P_{Z})} \hspace{-0.2cm} P_Z(z) \cdot I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = z)\hspace{0.05cm}.$$




Hints:


Questions

1

How large is the mutual information between  $X$  and  $W$,  if  $Z = 1$  always holds?

$ I(X; W | Z = 1) \ = \ $

$\ \rm bit$

2

How large is the mutual information between  $X$  and  $W$,  if  $Z = 2$  always holds?

$ I(X; W | Z = 2) \ = \ $

$\ \rm bit$

3

Now let  $p = {\rm Pr}(Z = 1)$.  How large is the conditional mutual information between  $X$  and  $W$, if  $z \in Z = \{1,\ 2\}$  is known?

$p = 1/2\text{:} \ \ \ I(X; W | Z) \ = \ $

$\ \rm bit$
$p = 3/4\text{:} \ \ \ I(X; W | Z) \ = \ $

$\ \rm bit$

4

How large is the unconditional mutual information for  $p = 1/2$?

$I(X; W) \ = \ $

$\ \rm bit$


Solution

Two-dimensional probability mass functions for  $Z = 1$

(1)  The upper graph is valid for  $Z = 1$   ⇒   $W = X + Y$. 

  • Under the conditions  $P_X(X) = \big [1/2, \ 1/2 \big]$  as well as  $P_Y(Y) = \big [1/2, \ 1/2 \big]$  the joint probabilities  $P_{ XW|Z=1 }(X, W)$  thus result according to the right graph (grey background).
  • Thus the following applies to the mutual information under the fixed condition  $Z = 1$:
$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 1) \hspace{-0.05cm} = \hspace{-1.1cm}\sum_{(x,w) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} (P_{XW}\hspace{0.01cm}|\hspace{0.01cm} Z\hspace{-0.03cm} =\hspace{-0.03cm} 1)} \hspace{-1.1cm} P_{XW\hspace{0.01cm}|\hspace{0.01cm} Z\hspace{-0.03cm} =\hspace{-0.03cm} 1} (x,w) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_{XW\hspace{0.01cm}|\hspace{0.01cm} Z\hspace{-0.03cm} =\hspace{-0.03cm} 1} (x,w) }{P_X(x) \cdot P_{W\hspace{0.01cm}|\hspace{0.01cm} Z\hspace{-0.03cm} =\hspace{-0.03cm} 1} (w) }$$
$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 1) = 2 \cdot \frac{1}{4} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1/4}{1/2 \cdot 1/4} + 2 \cdot \frac{1}{4} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1/4}{1/2 \cdot 1/2} $$
$$\Rightarrow \hspace{0.3cm} I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 1) \hspace{0.15cm} \underline {=0.5\,{\rm (bit)}} \hspace{0.05cm}.$$
  • The first term summarises the two horizontally shaded fields in the graph, the second term the vertically shaded fields.
  • The second term do not contribute because of  $\log_2 (1) = 0$ .


Two-dimensional probability mass functions for  $Z = 2$

(2)  For  $Z = 2$,  $W = \{4,\ 6,\ 8\}$  is valid, but nothing changes with respect to the probability functions compared to subtask  (1).

  • Consequently, the same conditional mutual information is obtained:
$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 2) = I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 1) \hspace{0.15cm} \underline {=0.5\,{\rm (bit)}} \hspace{0.05cm}.$$


(3)  The equation is for  $Z = \{1,\ 2\}$  with  ${\rm Pr}(Z = 1) =p$  and  ${\rm Pr}(Z = 2) =1-p$:

$$I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z) = p \cdot I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 1) + (1-p) \cdot I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z = 2)\hspace{0.15cm} \underline {=0.5\,{\rm (bit)}} \hspace{0.05cm}.$$
  • It is considered that according to subtasks  (1)  and  (2)  the conditional mutual information for given  $Z = 1$  and given  $Z = 2$  are equal.
  • Thus  $I(X; W|Z)$, i.e. under the condition of a stochastic random variable  $Z = \{1,\ 2\}$  with  $P_Z(Z) = \big [p, \ 1 – p\big ]$  is independent of  $p$.
  • In particular, the result is also valid for  $\underline{p = 1/2}$  and  $\underline{p = 3/4}$.


To calculate the joint probability for $XW$

(4)  The joint probability  $P_{ XW }$  depends on the  $Z$–probabilites  $p$  and  $1 – p$ .

  • For  $Pr(Z = 1) = Pr(Z = 2) = 1/2$  the scheme sketched on the right results.
  • Again, only the two horizontally shaded fields contribute to the mutual information:
$$ I(X;W) = 2 \cdot \frac{1}{8} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1/8}{1/2 \cdot 1/8} \hspace{0.15cm} \underline {=0.25\,{\rm (bit)}} \hspace{0.35cm} < \hspace{0.35cm} I(X;W \hspace{0.05cm}|\hspace{0.05cm} Z) \hspace{0.05cm}.$$

The result  $I(X; W|Z) > I(X; W)$  is true for this example, but also for many other applications:

  • If I know  $Z$, I know more about the 2D random variable  $XW$  than without this knowledge..
  • However, one must not generalize this result:
Sometimes  $I(X; W) > I(X; W|Z)$, actually applies, as in  Example 4  in the theory section.