Difference between revisions of "Aufgaben:Exercise 3.5Z: Kullback-Leibler Distance again"

From LNTwww
m (Text replacement - "Category:Aufgaben zu Informationstheorie" to "Category:Information Theory: Exercises")
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Einige Vorbemerkungen zu zweidimensionalen Zufallsgrößen
+
{{quiz-Header|Buchseite=Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables
 
}}
 
}}
  
[[File:P_ID2762__Inf_Z_3_4.png|right|frame|Ermittelte Wahrscheinlichkeitsfunktionen]]
+
[[File:P_ID2762__Inf_Z_3_4.png|right|frame|Determined probability mass functions]]
Die Wahrscheinlichkeitsfunktion lautet:
+
The probability mass function is:
 
:$$P_X(X) = \big[\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.15cm},\hspace{0.15cm} 0.25 \hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.03cm}\big]\hspace{0.05cm}.$$
 
:$$P_X(X) = \big[\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.15cm},\hspace{0.15cm} 0.25 \hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.03cm}\big]\hspace{0.05cm}.$$
Die Zufallsgröße  $X$  ist also gekennzeichnet durch
+
The random variable  $X$  is thus characterised by
* den Symbolumfang  $M=4$,
+
* the symbol set size  $M=4$,
* gleiche Wahrscheinlichkeiten $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .
+
* equal probabilities  $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .
  
  
Die Zufallsgröße  $Y$  ist stets eine Näherung für  $X$:  
+
The random variable  $Y$  is always an approximation for  $X$:  
*Sie wurde per Simulation aus einer Gleichverteilung gewonnen, wobei jeweils nur  $N$  Zufallszahlen ausgewertet wurden.  
+
*It was obtained by simulation from a uniform distribution, whereby only  $N$  random numbers were evaluated in each case.  This means:    
*Das heißt:   $P_Y(1)$, ... , $P_Y(4)$  sind im herkömmlichen Sinn keine Wahrscheinlichkeiten.  Sie beschreiben vielmehr  [[Theory_of_Stochastic_Signals/Wahrscheinlichkeit_und_relative_H%C3%A4ufigkeit#Bernoullisches_Gesetz_der_gro.C3.9Fen_Zahlen| relative Häufigkeiten]].
+
*$P_Y(1)$, ... , $P_Y(4)$  are not probabilities in the conventional sense.  Rather, they describe  [[Theory_of_Stochastic_Signals/Wahrscheinlichkeit_und_relative_H%C3%A4ufigkeit#Bernoullisches_Gesetz_der_gro.C3.9Fen_Zahlen| relative frequencies]].
  
  
Das Ergebnis der sechsten Versuchsreihe  (mit   $N=1000)$  wird demnach durch die folgende Wahrscheinlichkeitsfunktion zusammengefasst:
+
The result of the sixth test series  (with   $N=1000)$  is thus summarised by the following probability function:
  
 
:$$P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big]
 
:$$P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big]
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
Bei dieser Schreibweise ist bereits berücksichtigt, dass die Zufallsgrößen  $X$  und  $Y$  auf dem gleichen Alphabet  $X = \{1,\ 2,\ 3,\ 4\}$ basieren.
+
This notation already takes into account that the random variables  $X$  and  $Y$  are based on the same alphabet  $X = \{1,\ 2,\ 3,\ 4\}$.
  
Mit diesen Voraussetzungen gilt für die  ''relative Entropie''  (englisch:  ''Informational Divergence'') zwischen den beiden Wahrscheinlichkeitsfunktionen   $P_X(.)$  und  $P_Y(.)$ :
+
With these preconditions, the  '''informational divergence'''  between the two probability functions   $P_X(.)$  and  $P_Y(.)$ :
  
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) =  {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{M}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) =  {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{M}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$
  
Man bezeichnet   $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$   als (erste) Kullback–Leibler–Distanz.  
+
One calls   $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$   the (first)  '''Kullback-Leibler distance'''.
*Diese ist ein Maß für die Ähnlichkeit zwischen den zwei Wahrscheinlichkeitsfunktionen  $P_X(.)$  und  $P_Y(.)$.   
+
*This is a measure of the similarity between the two probability mass functions  $P_X(.)$  and  $P_Y(.)$.   
*Die Erwartungswertbildung geschieht hier hinsichtlich der (tatsächlich gleichverteilten) Zufallsgröße  $X$.  Dies wird durch die Nomenklatur   ${\rm E}_X\big[.\big]$  angedeutet.
+
*The expected value formation occurs here with regard to the  (actually equally distributed)  random variable  $X$.  This is indicated by the nomenclature   ${\rm E}_X\big[.\big]$.
  
  
Eine zweite Form der Kullback–Leibler–Distanz ergibt sich durch die Erwartungswertbildung  hinsichtlich der Zufallsgröße  $Y$   ⇒    ${\rm E}_Y\big [.\big ]$:
+
A second form of Kullback-Leibler distance results from the formation of expected values with respect to the random variable  $Y$   ⇒    ${\rm E}_Y\big [.\big ]$:
  
 
:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =  {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$
 
:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =  {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$
Line 39: Line 39:
  
  
 
+
Hints:  
 
+
*The exercise belongs to the chapter  [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables|Some preliminary remarks on two-dimensional random variables]].
 
+
*In particular, reference is made to the page  [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables#Informational_divergence_-_Kullback-Leibler_distance|Relative entropy – Kullback-Leibler distance]].
''Hinweise:''
+
*The entropy   $H(Y)$  and the Kullback-Leibler distance   $D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)$   in the above graph are to be understood in  "bit".
*Die Aufgabe gehört zum  Kapitel  [[Information_Theory/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgrößen|Einige Vorbemerkungen zu den 2D-Zufallsgrößen]].
+
* The fields marked with  "???"  in the graph are to be completed by you in this task.
*Insbesondere wird Bezug genommen auf die Seite  [[Information_Theory/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgrößen#Relative_Entropie_.E2.80.93_Kullback.E2.80.93Leibler.E2.80.93Distanz|Relative Entropie – Kullback-Leibler-Distanz]].
 
*Die Angaben der Entropie   $H(Y)$  und der Kullback–Leibler–Distanz   $D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)$   in obiger Grafik sind in „bit” zu verstehen.  
 
* Die in der Grafik  mit  ???"  versehenen Felder sollen von Ihnen in dieser Aufgabe ergänzt werden.
 
 
   
 
   
  
  
===Fragebogen===
+
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
  
{Welche Entropie besitzt die Zufallsgröße&nbsp; $X$ ?
+
{What is the entropy of the random variable&nbsp; $X$ ?
 
|type="{}"}
 
|type="{}"}
 
$H(X)\ = \ $ { 2 1% } $\ \rm bit$
 
$H(X)\ = \ $ { 2 1% } $\ \rm bit$
  
{Wie groß sind die Entropien der Zufallsgrößen&nbsp; $Y$&nbsp; $($Näherungen für&nbsp; $X)$?  
+
{What are the entropies of the random variables&nbsp; $Y$&nbsp; $($approximations for&nbsp; $X)$?  
 
|type="{}"}
 
|type="{}"}
 
$N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.9968 1% } $\ \rm bit$
 
$N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.9968 1% } $\ \rm bit$
Line 64: Line 61:
 
$N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.6855 1%  } $\ \rm bit$
 
$N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.6855 1%  } $\ \rm bit$
  
{Berechnen Sie die folgenden Kullback–Leibler–Distanzen.
+
{Calculate the following Kullback-Leibler distances.
 
|type="{}"}
 
|type="{}"}
 
$N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \ $ { 0.00328 1% } $\ \rm bit$
 
$N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \ $ { 0.00328 1% } $\ \rm bit$
Line 70: Line 67:
 
$N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)  \ = \ $  { 0.345 1% } $\ \rm bit$
 
$N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)  \ = \ $  { 0.345 1% } $\ \rm bit$
  
{Liefert&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; jeweils exakt das gleiche Ergebnis?
+
{Does&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; give exactly the same result in each case?
 
|type="()"}
 
|type="()"}
- Ja.
+
- Yes.
+ Nein.  
+
+ No.  
  
{Welche Aussagen gelten für die Kullback–Leibler–Distanzen bei&nbsp; $N = 4$?
+
{Which statements are true for the Kullback-Leibler distances with&nbsp; $N = 4$?
 
|type="[]"}
 
|type="[]"}
- Es gilt&nbsp; $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0$.
+
- &nbsp; $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0$&nbsp; is true.
- Es gilt&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.5 \ \rm  bit$.
+
- &nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.5 \ \rm  bit$&nbsp; is true.
+ $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; ist unendlich groß.  
+
+ &nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; is infinitely large.
- Es gilt&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0$.
+
- &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0$&nbsp; holds.
+ Es gilt&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$.  
+
+ &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$&nbsp; holds.  
- $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; ist unendlich groß.  
+
- &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; is infinitely large.
  
{Ändern sich sowohl&nbsp; $H(Y)$&nbsp; als auch&nbsp;  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; monoton mit&nbsp; $N$?
+
{Do both&nbsp; $H(Y)$&nbsp; and&nbsp;  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; change monotonically with&nbsp; $N$?
 
|type="()"}
 
|type="()"}
- Ja,
+
- Yes,
+ Nein.
+
+ No.
  
  
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
  
'''(1)'''&nbsp; Bei gleichen Wahrscheinlichkeiten gilt mit&nbsp; $M = 4$:  
+
'''(1)'''&nbsp; With equal probabilities, and with&nbsp; $M = 4$:  
 
:$$H(X) = {\rm log}_2 \hspace{0.1cm} M  
 
:$$H(X) = {\rm log}_2 \hspace{0.1cm} M  
 
\hspace{0.15cm} \underline {= 2\,{\rm (bit)}}  \hspace{0.05cm}.$$  
 
\hspace{0.15cm} \underline {= 2\,{\rm (bit)}}  \hspace{0.05cm}.$$  
Line 101: Line 98:
  
  
'''(2)'''&nbsp; Die Wahrscheinlichkeiten für die empirisch ermittelten Zufallsgrößen&nbsp; $Y$&nbsp; weichen im Allgemeinen&nbsp; (nicht immer!)&nbsp; von der Gleichverteilung um so mehr ab, je kleiner der Parameter&nbsp; $N$&nbsp; ist.&nbsp;  
+
'''(2)'''&nbsp; The probabilities for the empirically determined random variables&nbsp; $Y$&nbsp; generally&nbsp; (not always!)&nbsp; deviate from the uniform distribution the more the parameter&nbsp; $N$&nbsp; is smaller.&nbsp; One obtains for
 
 
Man erhält für
 
 
* $N = 1000 \ \ \Rightarrow \ \ P_Y(Y) =  \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]$:  
 
* $N = 1000 \ \ \Rightarrow \ \ P_Y(Y) =  \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]$:  
 
:$$H(Y) =  
 
:$$H(Y) =  
Line 118: Line 113:
  
  
'''(3)'''&nbsp; Die Gleichung für die gesuchte Kullback–Leibler–Distanz lautet:
+
'''(3)'''&nbsp; The equation for the Kullback-Leibler distance we are looking for is:
  
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)}
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)}
Line 128: Line 123:
 
\right ] \hspace{0.05cm}.$$
 
\right ] \hspace{0.05cm}.$$
  
Der Logarithmus zur Basis&nbsp; $ 2$&nbsp; &rArr;  &nbsp; $\log_2(.)$ wurde zur einfachen Nutzung des Taschenrechners durch den Zehnerlogarithmus &nbsp; &rArr;  &nbsp; $\lg(.)$ ersetzt.
+
The logarithm to the base&nbsp; $ 2$&nbsp; &rArr;  &nbsp; $\log_2(.)$&nbsp; was replaced by the logarithm to the base&nbsp; $ 10$ &nbsp; &rArr;  &nbsp; $\lg(.)$ for easy use of the calculator.
  
Man erhält die folgenden numerischen Ergebnisse:
+
The following numerical results are obtained:
* für $N=1000$:  
+
* for $N=1000$:  
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot  
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot  
 
\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272}  
 
\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272}  
 
\right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}}  \hspace{0.05cm},$$
 
\right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}}  \hspace{0.05cm},$$
* für $N=100$:  
+
* for $N=100$:  
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot  
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot  
 
\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30}  
 
\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30}  
 
\right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}}  \hspace{0.05cm},$$
 
\right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}}  \hspace{0.05cm},$$
* für $N=10$:  
+
* for $N=10$:  
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot  
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot  
 
\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1}  
 
\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1}  
Line 146: Line 141:
  
  
'''(4)'''&nbsp; Richtig ist&nbsp; <u>'''Nein'''</u>, wie am Beispiel&nbsp; $N = 100$&nbsp; gezeigt werden soll:
+
'''(4)'''&nbsp; Correct is&nbsp; <u>'''No'''</u>,&nbsp; as will be shown by the example&nbsp; $N = 100$&nbsp;:
 
:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =  \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25}  = 0.0407\ {\rm (bit)}\hspace{0.05cm}.$$
 
:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =  \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25}  = 0.0407\ {\rm (bit)}\hspace{0.05cm}.$$
  
*In der Teilaufgabe&nbsp; '''(3)'''&nbsp; haben wir stattdessen&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$&nbsp; erhalten.  
+
*In subtask&nbsp; '''(3)'''&nbsp; we got&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$&nbsp; instead.  
*Das bedeutet auch: &nbsp; Die Bezeichnung „Distanz” ist etwas irreführend.  
+
*This also means: &nbsp; The designation „distance” is somewhat misleading.  
*Danach würde man eigentlich&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ = $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; erwarten.
+
*According to this, one would actually expect&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;.
 
 
  
  
  
[[File:P_ID2763__Inf_Z_3_4e.png|right|frame|Wahrscheinlichkeitsfunktion, Entropie und Kullback–Leibler–Distanz]]
+
[[File:P_ID2763__Inf_Z_3_4e.png|right|frame|Probability function, entropy and Kullback-Leibler distance]]
'''(5)'''&nbsp; Mit&nbsp; $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$&nbsp; erhält man:
+
'''(5)'''&nbsp; With&nbsp; $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$&nbsp; one obtains:
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.$$
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.$$
  
*Aufgrund des ersten Terms ergibt sich für&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; ein unendlich großer Wert.  
+
*Because of the first term, the value of&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; is infinitely large.
*Für die zweite Kullback–Leibler–Distanz gilt:
+
*For the second Kullback-Leibler distance holds:
 
:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+
 
:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+
 
0.50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25}
 
0.50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
*Nach einer Grenzwertbetrachtung erkennt man, dass der erste Term das Ergebnis&nbsp; $0$&nbsp; liefert.&nbsp; Auch der zweite Term ergibt sich zu Null, und man erhält als Endergebnis:
+
*After looking at the limits, one can see that the first term yields the result&nbsp; $0$&nbsp;.&nbsp; The second term also yields zero, and one obtains as the final result:
 
:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} \hspace{0.05cm}.$$
 
:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} \hspace{0.05cm}.$$
  
Richtig sind somit die&nbsp; <u>Aussagen 3 und 5</u>:  
+
&nbsp; <u>Statements 3 and 5</u> are therefore correct:  
*Aus diesem Extrembeispiel wird deutlich, dass sich&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; stets von&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; unterscheidet.  
+
*From this extreme example it is clear that&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; is always different from&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;.
*Nur für den Sonderfall&nbsp; $P_Y \equiv P_X$&nbsp; sind beide Kullback–Leibler–Distanzen gleich, nämlich Null.  
+
*Only for the special case&nbsp; $P_Y \equiv P_X$&nbsp; are both Kullback-Leibler distances equal, namely zero.  
*Die nebenstehende Tabelle zeigt das vollständige Ergebnis dieser Aufgabe.
+
*The adjacent table shows the complete result of this task.
  
  
  
  
'''(6)'''&nbsp; Richtig ist wiederum&nbsp; <u>'''Nein'''</u>.&nbsp; Die Tendenz ist zwar eindeutig: &nbsp; Je größer&nbsp; $N$&nbsp; ist,
+
'''(6)'''&nbsp; Correct is again&nbsp; <u>'''No'''</u>. &nbsp; Although the tendency is clear: &nbsp; The larger&nbsp; $N$&nbsp; is,
* desto mehr nähert sich&nbsp; $H(Y)$&nbsp; im Prinzip dem Endwert&nbsp; $H(X) = 2 \ \rm bit$&nbsp; an.
+
* the more&nbsp; $H(Y)$&nbsp; approaches in principle the final value&nbsp; $H(X) = 2 \ \rm bit$,
* um so kleiner werden die Distanzen&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; und&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$.
+
* the smaller the distances&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; and&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ become.
  
  
Man erkennt aus der Tabelle aber auch, dass es Ausnahmen gibt:
+
However, one can also see from the table that there are exceptions:
* Die Entropie&nbsp; $H(Y)$&nbsp; ist für&nbsp; $N = 1000$&nbsp; kleiner als für&nbsp; $N = 400$.
+
* The entropy&nbsp; $H(Y)$&nbsp; is smaller for&nbsp; $N = 1000$&nbsp; than for&nbsp; $N = 400$.
* Die Distanz&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; ist für&nbsp; $N = 1000$&nbsp; größer als für&nbsp; $N = 400$.
+
* The distance&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; is greater for&nbsp; $N = 1000$&nbsp; than for&nbsp; $N = 400$.
*Der Grund hierfür ist, dass das hier dokumentierte empirische Experiment mit&nbsp; $N = 400$&nbsp; eher zu einer Gleichverteilung geführt hat als das Experiment mit&nbsp; $N = 1000$.
+
* The reason for this is that the experiment documented here with&nbsp; $N = 400$&nbsp; was more likely to lead to a uniform distribution than the experiment with&nbsp; $N = 1000$.
*Würde man dagegen sehr (unendlich) viele Versuche mit&nbsp; $N = 400$&nbsp; und&nbsp; $N = 1000$&nbsp; starten und über all diese mitteln, ergäbe sich tatsächlich der eigentlich erwartete monotone Verlauf.
+
*If, on the other hand, one were to start a very (infinitely) large number of experiments with&nbsp; $N = 400$&nbsp; and&nbsp; $N = 1000$&nbsp; and average over all of them, the actually expected monotonic course would actually result.
  
 
{{ML-Fuß}}
 
{{ML-Fuß}}
Line 192: Line 186:
  
  
[[Category:Information Theory: Exercises|^3.1 Allgemeines zu 2D-Zufallsgrößen^]]
+
[[Category:Information Theory: Exercises|^3.1 General Information on 2D Random Variables^]]

Latest revision as of 10:14, 24 September 2021

Determined probability mass functions

The probability mass function is:

$$P_X(X) = \big[\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.15cm},\hspace{0.15cm} 0.25 \hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.03cm}\big]\hspace{0.05cm}.$$

The random variable  $X$  is thus characterised by

  • the symbol set size  $M=4$,
  • equal probabilities  $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .


The random variable  $Y$  is always an approximation for  $X$:

  • It was obtained by simulation from a uniform distribution, whereby only  $N$  random numbers were evaluated in each case.  This means:  
  • $P_Y(1)$, ... , $P_Y(4)$  are not probabilities in the conventional sense.  Rather, they describe  relative frequencies.


The result of the sixth test series  (with  $N=1000)$  is thus summarised by the following probability function:

$$P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big] \hspace{0.05cm}.$$

This notation already takes into account that the random variables  $X$  and  $Y$  are based on the same alphabet  $X = \{1,\ 2,\ 3,\ 4\}$.

With these preconditions, the  informational divergence  between the two probability functions  $P_X(.)$  and  $P_Y(.)$ :

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{M} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$

One calls  $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$  the (first)  Kullback-Leibler distance.

  • This is a measure of the similarity between the two probability mass functions  $P_X(.)$  and  $P_Y(.)$.
  • The expected value formation occurs here with regard to the  (actually equally distributed)  random variable  $X$.  This is indicated by the nomenclature  ${\rm E}_X\big[.\big]$.


A second form of Kullback-Leibler distance results from the formation of expected values with respect to the random variable  $Y$   ⇒   ${\rm E}_Y\big [.\big ]$:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$



Hints:


Questions

1

What is the entropy of the random variable  $X$ ?

$H(X)\ = \ $

$\ \rm bit$

2

What are the entropies of the random variables  $Y$  $($approximations for  $X)$?

$N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$
$N=10^2\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$
$N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

3

Calculate the following Kullback-Leibler distances.

$N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$
$N=10^2\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$
$N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

4

Does  $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$  give exactly the same result in each case?

Yes.
No.

5

Which statements are true for the Kullback-Leibler distances with  $N = 4$?

  $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0$  is true.
  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.5 \ \rm bit$  is true.
  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$  is infinitely large.
  $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0$  holds.
  $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$  holds.
  $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$  is infinitely large.

6

Do both  $H(Y)$  and  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$  change monotonically with  $N$?

Yes,
No.


Solution

(1)  With equal probabilities, and with  $M = 4$:

$$H(X) = {\rm log}_2 \hspace{0.1cm} M \hspace{0.15cm} \underline {= 2\,{\rm (bit)}} \hspace{0.05cm}.$$


(2)  The probabilities for the empirically determined random variables  $Y$  generally  (not always!)  deviate from the uniform distribution the more the parameter  $N$  is smaller.  One obtains for

  • $N = 1000 \ \ \Rightarrow \ \ P_Y(Y) = \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]$:
$$H(Y) = 0.225 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.225} + 0.253 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.253} + 0.250 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.250} + 0.272 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.272} \hspace{0.15cm} \underline {= 1.9968\ {\rm (bit)}} \hspace{0.05cm},$$
  • $N = 100 \ \ \Rightarrow \ \ P_Y(Y) = \big[0.24, \ 0.16, \ 0.30, \ 0.30\big]$:
$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.9410\ {\rm (bit)}} \hspace{0.05cm},$$
  • $N = 10 \ \ \Rightarrow \ \ P_Y(Y) = \big[0.5, \ 0.1, \ 0.3, \ 0.1 \big]$:
$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.6855\ {\rm (bit)}} \hspace{0.05cm}.$$


(3)  The equation for the Kullback-Leibler distance we are looking for is:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} = \frac{1/4}{{\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25}{P_Y(1)} + \frac{0.25}{P_Y(2)} + \frac{0.25}{P_Y(3)} + \frac{0.25}{P_Y(4)} \right ] $$
$$\Rightarrow \hspace{0.3cm} D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{P_Y(1) \cdot P_Y(2)\cdot P_Y(3)\cdot P_Y(4)} \right ] \hspace{0.05cm}.$$

The logarithm to the base  $ 2$  ⇒   $\log_2(.)$  was replaced by the logarithm to the base  $ 10$   ⇒   $\lg(.)$ for easy use of the calculator.

The following numerical results are obtained:

  • for $N=1000$:
$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272} \right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}} \hspace{0.05cm},$$
  • for $N=100$:
$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30} \right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}} \hspace{0.05cm},$$
  • for $N=10$:
$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1} \right ] \hspace{0.15cm} \underline {= 0.345 \,{\rm (bit)}} \hspace{0.05cm}.$$


(4)  Correct is  No,  as will be shown by the example  $N = 100$ :

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25} = 0.0407\ {\rm (bit)}\hspace{0.05cm}.$$
  • In subtask  (3)  we got  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$  instead.
  • This also means:   The designation „distance” is somewhat misleading.
  • According to this, one would actually expect  $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ .


Probability function, entropy and Kullback-Leibler distance

(5)  With  $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$  one obtains:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.$$
  • Because of the first term, the value of  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$  is infinitely large.
  • For the second Kullback-Leibler distance holds:
$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+ 0.50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25} \hspace{0.05cm}.$$
  • After looking at the limits, one can see that the first term yields the result  $0$ .  The second term also yields zero, and one obtains as the final result:
$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} \hspace{0.05cm}.$$

  Statements 3 and 5 are therefore correct:

  • From this extreme example it is clear that  $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$  is always different from  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ .
  • Only for the special case  $P_Y \equiv P_X$  are both Kullback-Leibler distances equal, namely zero.
  • The adjacent table shows the complete result of this task.



(6)  Correct is again  No.   Although the tendency is clear:   The larger  $N$  is,

  • the more  $H(Y)$  approaches in principle the final value  $H(X) = 2 \ \rm bit$,
  • the smaller the distances  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$  and  $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ become.


However, one can also see from the table that there are exceptions:

  • The entropy  $H(Y)$  is smaller for  $N = 1000$  than for  $N = 400$.
  • The distance  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$  is greater for  $N = 1000$  than for  $N = 400$.
  • The reason for this is that the experiment documented here with  $N = 400$  was more likely to lead to a uniform distribution than the experiment with  $N = 1000$.
  • If, on the other hand, one were to start a very (infinitely) large number of experiments with  $N = 400$  and  $N = 1000$  and average over all of them, the actually expected monotonic course would actually result.