Difference between revisions of "Aufgaben:Exercise 3.5Z: Kullback-Leibler Distance again"

From LNTwww

@@ Line 1: / Line 1: @@
-{{quiz-Header|Buchseite=Informationstheorie/Einige Vorbemerkungen zu zweidimensionalen Zufallsgrößen
+{{quiz-Header|Buchseite=Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables
 }}
-[[File:P_ID2762__Inf_Z_3_4.png|right|]]
+[[File:P_ID2762__Inf_Z_3_4.png|right|frame|Determined probability mass functions]]
-Die Wahrscheinlichkeitsfunktion lautet:
+The probability mass function is:
+:$$P_X(X) = \big[\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.15cm},\hspace{0.15cm} 0.25 \hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.03cm}\big]\hspace{0.05cm}.$$
+The random variable&nbsp; $X$&nbsp; is thus characterised by
+* the symbol set size&nbsp; $M=4$,
+* equal probabilities&nbsp; $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .
-$$P_Y(X) = [\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.03cm} 0.25\hspace{0.03cm},\hspace{0.03cm} 0.25 \hspace{0.03cm}, \hspace{0.03cm} 0.25\hspace{0.03cm}]\hspace{0.05cm}$$
-Die Zufallsgröße $X$ ist also gekennzeichnet
-:* durch den Symbolumfang $M=4$,
+The random variable&nbsp; $Y$&nbsp; is always an approximation for&nbsp; $X$:
-:* mit gleichen Wahrscheinlichkeiten.
+*It was obtained by simulation from a uniform distribution, whereby only&nbsp; $N$&nbsp; random numbers were evaluated in each case.&nbsp; This means: &nbsp;
+*$P_Y(1)$, ... , $P_Y(4)$&nbsp; are not probabilities in the conventional sense.&nbsp; Rather, they describe&nbsp; [[Theory_of_Stochastic_Signals/Wahrscheinlichkeit_und_relative_H%C3%A4ufigkeit#Bernoullisches_Gesetz_der_gro.C3.9Fen_Zahlen| relative frequencies]].
-Die Zufallsgröße $Y$ ist stets eine Näherung für $X$. Sie wurde per Simulation aus einer Gleichverteilung gewonnen, wobei jeweils nur $N$ Zufallswerte ausgewertet wurden. Das heißt:
-$P_Y(1)$,...,$P_Y(4)$ sind im herkömmlichen Sinn keine Wahrscheinlichkeiten. Sie beschreiben vielmehr [http://en.lntwww.de/Stochastische_Signaltheorie/Wahrscheinlichkeit_und_relative_H%C3%A4ufigkeit#Bernoullisches_Gesetz_der_gro.C3.9Fen_Zahlen relative Häufigkeiten].
-Das Ergebnis der sechsten Versuchsreihe (mit  $N=1000$) ird demnach durch die folgende Wahrscheinlichkeitsfunktion zusammengefasst:
+The result of the sixth test series&nbsp; (with&nbsp;  $N=1000)$&nbsp; is thus summarised by the following probability function:
-$$P_Y(X) = [\hspace{0.05cm}0.225\hspace{0.05cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.05cm} 0.250 \hspace{0.05cm}, \hspace{0.05cm} 0.272\hspace{0.05cm}]
+:$$P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big]
-\hspace{0.05cm}$$
+\hspace{0.05cm}.$$
-Bei dieser Schreibweise ist bereits berücksichtigt, dass die Zufallsgrößen $X$ und $Y$ auf dem gleichen Alphabet $X =$ {1, 2, 3, 4} basieren.
+This notation already takes into account that the random variables&nbsp; $X$&nbsp; and&nbsp; $Y$&nbsp; are based on the same alphabet&nbsp; $X = \{1,\ 2,\ 3,\ 4\}$.
-Mit diesen Voraussetzungen gilt für die relative Entropie (englisch: Informational Divergence) zwischen den Wahrscheinlichkeitsfunktionen  $P_X(.)$ und $P_Y(.)$ :
+With these preconditions, the&nbsp; '''informational divergence'''&nbsp; between the two probability functions&nbsp;  $P_X(.)$&nbsp; and&nbsp; $P_Y(.)$ :
-$D( P_X || P_Y) = E_X [ log_2 \frac{P_X(X)}{P_Y(Y)}] = \sum\limits_{\mu=1}^M P_X(\mu) . log_2 \frac{P_X(\mu)}{P_Y(\mu)}$
+:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) =  {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{M}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$
-Man bezeichnet  $D( P_X || P_Y)$  als Kullback–Leibler–Distanz. Diese ist ein Maß für die Ähnlichkeit zwischen den beiden Wahrscheinlichkeitsfunktionen
+One calls&nbsp;  $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$&nbsp;  the (first)&nbsp; '''Kullback-Leibler distance'''.
-$P_X(.)$ und $P_Y(.)$.  Die Erwartungswertbildung geschieht hier hinsichtlich der (tatsächlich gleichverteilten) Zufallsgröße $X$.  Dies wird durch die Nomenklatur  $E_X[.]$ angedeutet.
+*This is a measure of the similarity between the two probability mass functions&nbsp; $P_X(.)$&nbsp; and&nbsp; $P_Y(.)$.
+*The expected value formation occurs here with regard to the&nbsp; (actually equally distributed)&nbsp; random variable&nbsp; $X$.&nbsp; This is indicated by the nomenclature&nbsp;  ${\rm E}_X\big[.\big]$.
-Eine zweite Form der Kullback–Leibler–Distanz ergibt sich durch die Erwartungswertbildung  hinsichtlich der Zufallsgröße $Y \Rightarrow E_Y[.]$:
-$D( P_Y || P_X) = E_Y [ log_2 \frac{P_Y(Y)}{P_Y(Y)}] = \sum\limits_{\mu=1}^M P_Y(\mu) . log_2 \frac{P_Y(\mu)}{P_X(\mu)}$
+A second form of Kullback-Leibler distance results from the formation of expected values with respect to the random variable&nbsp; $Y$ &nbsp; &rArr; &nbsp;  ${\rm E}_Y\big [.\big ]$:
-'''Hinweis:''' Die Aufgabe bezieht sich auf das [http://en.lntwww.de/Informationstheorie/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgr%C3%B6%C3%9Fen Kapitel 3.1 ] dieses Buches. Die Angaben der Entropie  $H(Y)$ und der Kullback–Leibler–Distanz  $D( P_X || P_Y)$  in obiger Grafik sind in „bit” zu verstehen. die mit „???"  versehenen Felder sollen von Ihnen in dieser Aufgabe ergänzt werden.
+:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =  {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$
-===Fragebogen===
+Hints:
+*The exercise belongs to the chapter&nbsp; [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables|Some preliminary remarks on two-dimensional random variables]].
+*In particular, reference is made to the page&nbsp; [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables#Informational_divergence_-_Kullback-Leibler_distance|Relative entropy &ndash; Kullback-Leibler distance]].
+*The entropy&nbsp;  $H(Y)$&nbsp; and the Kullback-Leibler distance&nbsp;  $D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp;  in the above graph are to be understood in&nbsp; "bit".
+* The fields marked with&nbsp; "???"&nbsp; in the graph are to be completed by you in this task.
+===Questions===
 <quiz display=simple>
-{Welche Entropie besitzt die Zufallsgröße $X$ ?
+{What is the entropy of the random variable&nbsp; $X$ ?
 |type="{}"}
-$H(X)$ = { 2 3% } $bit$
+$H(X)\ = \ $ { 2 1% } $\ \rm bit$
-{Wie groß sind die Entropien der Zufallsgrößen $Y$ (Näherungen für $X$)?
+{What are the entropies of the random variables&nbsp; $Y$&nbsp; $($approximations for&nbsp; $X)$?
 |type="{}"}
-$N=1000$ :  $H(Y)$ = { 1.9968 1% } $bit$
+$N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.9968 1% } $\ \rm bit$
-$N=100$ : $H(Y)$ = { 1.941 1% } $bit$
+$N=10^2\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.941 1% } $\ \rm bit$
-$N=10$ :  $H(Y)$ = { 1.6855 1%  } $bit$
+$N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.6855 1%  } $\ \rm bit$
-{Berechnen Sie die folgenden Kullback–Leibler–Distanzen.
+{Calculate the following Kullback-Leibler distances.
 |type="{}"}
-$N=1000$ :  $D( P_X || P_Y)$ = { 3.28 1% } . 10 ( { -3 } )$bit$
+$N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \ $ { 0.00328 1% } $\ \rm bit$
-$N=100$ : $D( P_X || P_Y)$=  { 4.42 1% } . 10 ( { -2 } )$bit$
+$N=10^2\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \ $  { 0.0442 1% } $\ \rm bit$
-$N=10$ :  $D( P_X || P_Y)$=  { 3.45 1% } . 10 ( { -1 } )$bit$
+$N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)  \ = \ $  { 0.345 1% } $\ \rm bit$
-{Liefert $D(P_Y||P_X)$ jeweils exakt das gleiche Ergebnis?
+{Does&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; give exactly the same result in each case?
-|type="[]"}
+|type="()"}
-+ Falsch
+- Yes.
-- Richtig
++ No.
-{Welche Aussagen gelten für die Kullback–Leibler–Distanzen bei $N = 4$?
+{Which statements are true for the Kullback-Leibler distances with&nbsp; $N = 4$?
 |type="[]"}
-- Es gilt $D(P_X||P_Y) = 0$.
+- &nbsp; $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0$&nbsp; is true.
-- Es gilt $D(P_X||P_Y) = 0.5 bit$
+- &nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.5 \ \rm  bit$&nbsp; is true.
-+ $D(P_X||P_Y)$ ist unendlich groß
++ &nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; is infinitely large.
--  Es gilt $D(P_Y||P_X) = 0$.
+- &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0$&nbsp; holds.
-+ Es gilt $D(P_Y||P_X) = 0.5 bit$.
++ &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$&nbsp; holds.
--  $D(P_Y||P_X)$ ist unendlich groß.
+- &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; is infinitely large.
-{Ändern sich $H(Y)$ und $D(P_X||P_Y)$monoton mit $N$?
+{Do both&nbsp; $H(Y)$&nbsp; and&nbsp;  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; change monotonically with&nbsp; $N$?
-|type="[]"}
+|type="()"}
-+Falsch
+- Yes,
--Richtig
++ No.
 </quiz>
-===Musterlösung===
+===Solution===
 {{ML-Kopf}}
-'''1.'''Bei gleichen Wahrscheinlichkeiten gilt mit $M = 4$ :
+'''(1)'''&nbsp; With equal probabilities, and with&nbsp; $M = 4$:
+:$$H(X) = {\rm log}_2 \hspace{0.1cm} M
-$H(X) = log_2   M = 2 (bit)$
+\hspace{0.15cm} \underline {= 2\,{\rm (bit)}}  \hspace{0.05cm}.$$
-'''2.''' Die Wahrscheinlichkeiten für die empirisch ermittelten Zufallsgrößen $Y$ weichen im Allgemeinen (nicht immer!) von der Gleichverteilung um so mehr ab, je kleiner der Parameter $N$ ist. Man erhält
-:* $N = 1000 \Rightarrow  P_Y(Y) =  [0.225, 0.253, 0.250, 0.272]$ :
-$H(Y) = 0.225 . log_2 \frac{1}{0.225} +0.253. log_2 \frac{1}{0.253} + 0.250 . log_2 \frac{1}{0.250}+ 0.272 . log_2 \frac{1}{0.272} = 1.9968 (bit)$
-:* $N = 100\Rightarrow  P_Y(Y) = [0.24, 0.16, 0.30, 0.30]$ :
-$H(Y) =$......$= 1.9410$
-:* $N = 10 \Rightarrow  P_Y(Y) =  [0.5, 0.1, 0.3, 0.1]$:
-$H(Y) =$......$= 1.6855$
-'''3.'''  Die Gleichung für die gesuchte Kullback–Leibler–Distanz lautet:
-$$D(P_X||P_Y) = \sum\limits_{\mu=1}^4 P_X(\mu) . log_2 \frac{P_X(\mu)}{P_Y(\mu)} =$$
-$$= \frac{1/4}{lg(2)} .[lg \frac{0.25}{P_Y(1)}+\frac{0.25}{P_Y(2)}+\frac{0.25}{P_Y(3)} + \frac{0.25}{P_Y(4)}] =$$
+'''(2)'''&nbsp; The probabilities for the empirically determined random variables&nbsp; $Y$&nbsp; generally&nbsp; (not always!)&nbsp; deviate from the uniform distribution the more the parameter&nbsp; $N$&nbsp; is smaller.&nbsp; One obtains for
+* $N = 1000 \ \ \Rightarrow \ \ P_Y(Y) =  \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]$:
+:$$H(Y) =
+.225 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.225} +
+.253 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.253} +
+.250 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.250} +
+.272 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.272}
+\hspace{0.15cm} \underline {= 1.9968\ {\rm (bit)}}  \hspace{0.05cm},$$
+* $N = 100 \ \ \Rightarrow \ \  P_Y(Y) = \big[0.24, \ 0.16, \ 0.30,  \ 0.30\big]$:
+:$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.9410\ {\rm (bit)}}  \hspace{0.05cm},$$
+* $N = 10 \ \ \Rightarrow \ \  P_Y(Y) =  \big[0.5, \ 0.1, \ 0.3, \ 0.1 \big]$:
+:$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.6855\ {\rm (bit)}}  \hspace{0.05cm}.$$
-$$=\frac{1}{4 . lg(2)} . [lg \frac{0.25^4}{P_Y(1) . P_Y(2) . P_Y(3) . P_Y(4)}]$$
-Der Logarithmus zur Basis 2  $\Rightarrow log_2(.)$ wurde zur einfachen Nutzung des Taschenrechners durch den Zehnerlogarithmus  $\Rightarrow lg(.)$  ersetzt. Man erhält die folgenden numerischen Ergebnisse:
-:* $N=1000$ :
-$$D(P_X||P_Y)=\frac{1}{4 . lg(2)} . [lg \frac{0.25^4}{0,225 . 0,253 . 0,250 . 0,272}] = 3,28 . 10^{-3} (bit)$$
-:* $N=100$ :
-$$D(P_X||P_Y)=\frac{1}{4 . lg(2)} . [lg \frac{0.25^4}{0,24 . 0,16 . 0,30 . 0,30}] = 4,42 . 10^{-2} (bit)$$
+'''(3)'''&nbsp; The equation for the Kullback-Leibler distance we are looking for is:
-::* $N=100$ :
-$$D(P_X||P_Y)=\frac{1}{4 . lg(2)} . [lg \frac{0.25^4}{0,5 . 0,1. 0,3 . 0,1}] = 3,45. 10^{-1} (bit)$$
+:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)}
+=  \frac{1/4}{{\rm lg} \hspace{0.1cm}(2)} \cdot
+\left [ {\rm lg} \hspace{0.1cm} \frac{0.25}{P_Y(1)} + \frac{0.25}{P_Y(2)} + \frac{0.25}{P_Y(3)} + \frac{0.25}{P_Y(4)}
+\right ] $$
+:$$\Rightarrow \hspace{0.3cm} D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)  =   \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
+\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{P_Y(1) \cdot P_Y(2)\cdot P_Y(3)\cdot P_Y(4)}
+\right ] \hspace{0.05cm}.$$
+The logarithm to the base&nbsp; $ 2$&nbsp; &rArr;  &nbsp; $\log_2(.)$&nbsp; was replaced by the logarithm to the base&nbsp; $ 10$ &nbsp; &rArr;  &nbsp; $\lg(.)$ for easy use of the calculator.
-'''5.'''  Richtig ist Nein, wie am Beispiel $N = 100$ gezeigt werden soll:
+The following numerical results are obtained:
+* for $N=1000$:
+:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
+\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272}
+\right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}}  \hspace{0.05cm},$$
+* for $N=100$:
+:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
+\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30}
+\right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}}  \hspace{0.05cm},$$
+* for $N=10$:
+:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
+\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1}
+\right ] \hspace{0.15cm} \underline {= 0.345 \,{\rm (bit)}}  \hspace{0.05cm}.$$
-$$D(P_X||P_Y) = \sum\limits_{\mu=1}^M P_X(\mu) . log_2 \frac{P_X(\mu)}{P_Y(\mu)} =$$
-$$ = 0.24 . log_2 \frac{0.24}{0.25} +0.16. log_2 \frac{16}{0.25} +2 .  0,30  . log_2 \frac{0.30}{0.25} = 0.0407 (bit)$$
-In der Teilaufgabe (c) haben wir stattdessen $D(P_X||P_Y)$ = 0.0442 erhalten. Das bedeutet auch: Der Name „Distanz” ist etwas irreführend. Danach würde man eigentlich $D(P_Y||P_X)$ = $D(P_X||P_Y)$ erwarten.
-'''6.'''  Mit $P_Y(X) = [0, 0.25, 0.5, 0.25]$ erhält man:
+'''(4)'''&nbsp; Correct is&nbsp; <u>'''No'''</u>,&nbsp; as will be shown by the example&nbsp; $N = 100$&nbsp;:
+:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =   \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25}  = 0.0407\ {\rm (bit)}\hspace{0.05cm}.$$
-$$D(P_X||P_Y) = 0,25 . log_2 \frac{0.25}{0} +2 . 0,25 . log_2 \frac{0.25}{0.25} +0,25 . log_2 \frac{0.25}{0.50}$$
+*In subtask&nbsp; '''(3)'''&nbsp; we got&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$&nbsp; instead.
+*This also means: &nbsp; The designation „distance” is somewhat misleading.
+*According to this, one would actually expect&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;.
-Aufgrund des ersten Terms ergibt sich für $D(P_X||P_Y)$ ein unendlich großer Wert. Für die zweite Kullback–Leibler–Distanz gilt:
-$$D(P_Y||P_X) = 0 . log_2 \frac{0}{0.25} +2 . 0,25 . log_2 \frac{0.25}{0.25} +0,25 . log_2 \frac{0.5}{0.25}$$
-Nach einer Grenzwertbetrachtung erkennt man, dass der erste Term das Ergebnis $0$ liefert. Auch der zweite Term ergibt sich zu $0$, und man erhält als Endergebnis:
+[[File:P_ID2763__Inf_Z_3_4e.png|right|frame|Probability function, entropy and Kullback-Leibler distance]]
+'''(5)'''&nbsp; With&nbsp; $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$&nbsp; one obtains:
+:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.$$
-$D(P_Y||P_X) = 0,50 . log_2(2) = 0.5 (bit)$
+*Because of the first term, the value of&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; is infinitely large.
+*For the second Kullback-Leibler distance holds:
+:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+
+.50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25}
+	\hspace{0.05cm}.$$
-Richtig sind somit die Aussagen $3$ und $5$. Auch aus diesem Extrembeispiel wird deutlich, dass sich $D(P_Y||P_X)$ stets von $D(P_X||P_Y)$ unterscheidet. Nur für den Sonderfall P_Y = P_X sind beide Kullback–Leibler–Distanzen gleich, nämlich $0$. Die folgende Tabelle zeigt das vollständige Ergebnis dieser Aufgabe.
+*After looking at the limits, one can see that the first term yields the result&nbsp; $0$&nbsp;.&nbsp; The second term also yields zero, and one obtains as the final result:
+:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} 	\hspace{0.05cm}.$$
+&nbsp; <u>Statements 3 and 5</u> are therefore correct:
+*From this extreme example it is clear that&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; is always different from&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;.
+*Only for the special case&nbsp; $P_Y \equiv P_X$&nbsp; are both Kullback-Leibler distances equal, namely zero.
+*The adjacent table shows the complete result of this task.
-[[File:P_ID2763__Inf_Z_3_4e.png|right|]]
-'''7.''' Die richtige Antwort ist Nein. Die Tendenz ist zwar eindeutig: Je größer $N$ ist,
-:* desto mehr nähert sich $H(Y)$ im Prinzip dem Endwert $H(X) = 2$ bit an.
-:* um so kleiner werden die Distanzen $D(P_X||P_Y)$ und $D(P_Y||P_X)$.
-Man erkennt aus obiger Tabelle aber auch, dass es Ausnahmen gibt:
-:* Die Entropie $H(Y)$ ist für $N = 1000$ kleiner als für $N = 400$,
+'''(6)'''&nbsp; Correct is again&nbsp; <u>'''No'''</u>. &nbsp; Although the tendency is clear: &nbsp; The larger&nbsp; $N$&nbsp; is,
+* the more&nbsp; $H(Y)$&nbsp; approaches in principle the final value&nbsp; $H(X) = 2 \ \rm bit$,
+* the smaller the distances&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; and&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ become.
-:* Die Distanz $D(P_X||P_Y)$ ist für $N = 1000$ größer als für $N = 400$.
-Der Grund hierfür ist, dass das hier dokumentierte empirische Experiment mit $N = 400$ eher zu einer Gleichverteilung geführt hat als das Experiment mit $N = 1000$.
-Würde man dagegen sehr (unendlich) viele Versuche mit $N = 400$ und $N = 1000$ starten und über diese mitteln, ergäbe sich tatsächlich der eigentlich erwartete monotone Verlauf.
+However, one can also see from the table that there are exceptions:
+* The entropy&nbsp; $H(Y)$&nbsp; is smaller for&nbsp; $N = 1000$&nbsp; than for&nbsp; $N = 400$.
+* The distance&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; is greater for&nbsp; $N = 1000$&nbsp; than for&nbsp; $N = 400$.
+* The reason for this is that the experiment documented here with&nbsp; $N = 400$&nbsp; was more likely to lead to a uniform distribution than the experiment with&nbsp; $N = 1000$.
+*If, on the other hand, one were to start a very (infinitely) large number of experiments with&nbsp; $N = 400$&nbsp; and&nbsp; $N = 1000$&nbsp; and average over all of them, the actually expected monotonic course would actually result.
 {{ML-Fuß}}
@@ Line 155: / Line 186: @@
-[[Category:Aufgaben zu Informationstheorie|^3.1 Vorbemerkungen zu 2D-Zufallsgrößen^]]
+[[Category:Information Theory: Exercises|^3.1 General Information on 2D Random Variables^]]

Latest revision as of 10:14, 24 September 2021

Return to book

Determined probability mass functions

The probability mass function is:

$$P_X(X) = \big[\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.15cm},\hspace{0.15cm} 0.25 \hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.03cm}\big]\hspace{0.05cm}.$$

The random variable $X$ is thus characterised by

the symbol set size $M=4$,
equal probabilities $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .

The random variable $Y$ is always an approximation for $X$:

It was obtained by simulation from a uniform distribution, whereby only $N$ random numbers were evaluated in each case. This means:
$P_Y(1)$, ... , $P_Y(4)$ are not probabilities in the conventional sense. Rather, they describe relative frequencies.

The result of the sixth test series (with $N=1000)$ is thus summarised by the following probability function:

$$P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big] \hspace{0.05cm}.$$

This notation already takes into account that the random variables $X$ and $Y$ are based on the same alphabet $X = \{1,\ 2,\ 3,\ 4\}$.

With these preconditions, the informational divergence between the two probability functions $P_X(.)$ and $P_Y(.)$ :

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{M} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$

One calls $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$ the (first) Kullback-Leibler distance.

This is a measure of the similarity between the two probability mass functions $P_X(.)$ and $P_Y(.)$.
The expected value formation occurs here with regard to the (actually equally distributed) random variable $X$. This is indicated by the nomenclature ${\rm E}_X\big[.\big]$.

A second form of Kullback-Leibler distance results from the formation of expected values with respect to the random variable $Y$ ⇒ ${\rm E}_Y\big [.\big ]$:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$

Hints:

The exercise belongs to the chapter Some preliminary remarks on two-dimensional random variables.
In particular, reference is made to the page Relative entropy – Kullback-Leibler distance.
The entropy $H(Y)$ and the Kullback-Leibler distance $D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ in the above graph are to be understood in "bit".
The fields marked with "???" in the graph are to be completed by you in this task.

Questions

What is the entropy of the random variable $X$ ?

$H(X)\ = \ $

$\ \rm bit$

What are the entropies of the random variables $Y$ $($approximations for $X)$?

$N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

$N=10^2\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

$N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

Calculate the following Kullback-Leibler distances.

$N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

$N=10^2\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

$N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

Does $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ give exactly the same result in each case?

	Yes.
	No.

Which statements are true for the Kullback-Leibler distances with $N = 4$?

	$D(P_X \hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0.5 \ \rm bit$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y)$ is infinitely large.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X)$ is infinitely large.

Do both $H(Y)$ and $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ change monotonically with $N$?

	Yes,
	No.

Solution

(1) With equal probabilities, and with $M = 4$:

$$H(X) = {\rm log}_2 \hspace{0.1cm} M \hspace{0.15cm} \underline {= 2\,{\rm (bit)}} \hspace{0.05cm}.$$

(2) The probabilities for the empirically determined random variables $Y$ generally (not always!) deviate from the uniform distribution the more the parameter $N$ is smaller. One obtains for

$N = 1000 \ \ \Rightarrow \ \ P_Y(Y) = \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]$:

$$H(Y) = 0.225 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.225} + 0.253 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.253} + 0.250 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.250} + 0.272 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.272} \hspace{0.15cm} \underline {= 1.9968\ {\rm (bit)}} \hspace{0.05cm},$$

$N = 100 \ \ \Rightarrow \ \ P_Y(Y) = \big[0.24, \ 0.16, \ 0.30, \ 0.30\big]$:

$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.9410\ {\rm (bit)}} \hspace{0.05cm},$$

$N = 10 \ \ \Rightarrow \ \ P_Y(Y) = \big[0.5, \ 0.1, \ 0.3, \ 0.1 \big]$:

$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.6855\ {\rm (bit)}} \hspace{0.05cm}.$$

(3) The equation for the Kullback-Leibler distance we are looking for is:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} = \frac{1/4}{{\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25}{P_Y(1)} + \frac{0.25}{P_Y(2)} + \frac{0.25}{P_Y(3)} + \frac{0.25}{P_Y(4)} \right ] $$

$$\Rightarrow \hspace{0.3cm} D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{P_Y(1) \cdot P_Y(2)\cdot P_Y(3)\cdot P_Y(4)} \right ] \hspace{0.05cm}.$$

The logarithm to the base $ 2$ ⇒ $\log_2(.)$ was replaced by the logarithm to the base $ 10$ ⇒ $\lg(.)$ for easy use of the calculator.

The following numerical results are obtained:

for $N=1000$:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272} \right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}} \hspace{0.05cm},$$

for $N=100$:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30} \right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}} \hspace{0.05cm},$$

for $N=10$:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1} \right ] \hspace{0.15cm} \underline {= 0.345 \,{\rm (bit)}} \hspace{0.05cm}.$$

(4) Correct is No, as will be shown by the example $N = 100$ :

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25} = 0.0407\ {\rm (bit)}\hspace{0.05cm}.$$

In subtask (3) we got $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$ instead.
This also means: The designation „distance” is somewhat misleading.
According to this, one would actually expect $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ .

Probability function, entropy and Kullback-Leibler distance

(5) With $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$ one obtains:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.$$

Because of the first term, the value of $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ is infinitely large.
For the second Kullback-Leibler distance holds:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+ 0.50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25} \hspace{0.05cm}.$$

After looking at the limits, one can see that the first term yields the result $0$ . The second term also yields zero, and one obtains as the final result:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} \hspace{0.05cm}.$$

Statements 3 and 5 are therefore correct:

From this extreme example it is clear that $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ is always different from $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ .
Only for the special case $P_Y \equiv P_X$ are both Kullback-Leibler distances equal, namely zero.
The adjacent table shows the complete result of this task.

(6) Correct is again No. Although the tendency is clear: The larger $N$ is,

the more $H(Y)$ approaches in principle the final value $H(X) = 2 \ \rm bit$,
the smaller the distances $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ and $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ become.

However, one can also see from the table that there are exceptions:

The entropy $H(Y)$ is smaller for $N = 1000$ than for $N = 400$.
The distance $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ is greater for $N = 1000$ than for $N = 400$.
The reason for this is that the experiment documented here with $N = 400$ was more likely to lead to a uniform distribution than the experiment with $N = 1000$.
If, on the other hand, one were to start a very (infinitely) large number of experiments with $N = 400$ and $N = 1000$ and average over all of them, the actually expected monotonic course would actually result.

Retrieved from "http://en.lntwww.de/index.php?title=Aufgaben:Exercise_3.5Z:_Kullback-Leibler_Distance_again&oldid=41387"

Category:

Information Theory: Exercises

	$D(P_X \hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0.5 \ \rm bit$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y)$ is infinitely large.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X)$ is infinitely large.