Difference between revisions of "Aufgaben:Exercise 3.5Z: Kullback-Leibler Distance again"

From LNTwww

@@ Line 1: / Line 1: @@
-{{quiz-Header|Buchseite=Informationstheorie/Einige Vorbemerkungen zu zweidimensionalen Zufallsgrößen
+{{quiz-Header|Buchseite=Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables
 }}
-[[File:P_ID2762__Inf_Z_3_4.png|right|Empirisch ermittelte Wahrscheinlichkeitsfunktionen]]
+[[File:P_ID2762__Inf_Z_3_4.png|right|frame|Determined probability mass functions]]
-Die Wahrscheinlichkeitsfunktion lautet:
+The probability mass function is:
-:$$P_X(X) = [\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.03cm} 0.25\hspace{0.03cm},\hspace{0.03cm} 0.25 \hspace{0.03cm}, \hspace{0.03cm} 0.25\hspace{0.03cm}]\hspace{0.05cm}$$
+:$$P_X(X) = \big[\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.15cm},\hspace{0.15cm} 0.25 \hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.03cm}\big]\hspace{0.05cm}.$$
-Die Zufallsgröße $X$ ist also gekennzeichnet durch
+The random variable&nbsp; $X$&nbsp; is thus characterised by
-* den Symbolumfang $M=4$,
+* the symbol set size&nbsp; $M=4$,
-* gleiche Wahrscheinlichkeiten $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .
+* equal probabilities&nbsp; $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .
-Die Zufallsgröße $Y$ ist stets eine Näherung für $X$. Sie wurde per Simulation aus einer Gleichverteilung gewonnen, wobei jeweils nur $N$ Zufallswerte ausgewertet wurden. Das heißt:
+The random variable&nbsp; $Y$&nbsp; is always an approximation for&nbsp; $X$:
-$P_Y(1)$, ... ,$P_Y(4)$ sind im herkömmlichen Sinn keine Wahrscheinlichkeiten. Sie beschreiben vielmehr [[Stochastische_Signaltheorie/Wahrscheinlichkeit_und_relative_H%C3%A4ufigkeit#Bernoullisches_Gesetz_der_gro.C3.9Fen_Zahlen| relative Häufigkeiten]].
+*It was obtained by simulation from a uniform distribution, whereby only&nbsp; $N$&nbsp; random numbers were evaluated in each case.&nbsp; This means: &nbsp;
+*$P_Y(1)$, ... , $P_Y(4)$&nbsp; are not probabilities in the conventional sense.&nbsp; Rather, they describe&nbsp; [[Theory_of_Stochastic_Signals/Wahrscheinlichkeit_und_relative_H%C3%A4ufigkeit#Bernoullisches_Gesetz_der_gro.C3.9Fen_Zahlen| relative frequencies]].
-Das Ergebnis der sechsten Versuchsreihe (mit  $N=1000$) wird demnach durch die folgende Wahrscheinlichkeitsfunktion zusammengefasst:
-:$$P_Y(X) = [\hspace{0.05cm}0.225\hspace{0.05cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.05cm} 0.250 \hspace{0.05cm}, \hspace{0.05cm} 0.272\hspace{0.05cm}]
+The result of the sixth test series&nbsp; (with&nbsp;  $N=1000)$&nbsp; is thus summarised by the following probability function:
-\hspace{0.05cm}$$
-Bei dieser Schreibweise ist bereits berücksichtigt, dass die Zufallsgrößen $X$ und $Y$ auf dem gleichen Alphabet $X = \{1, 2, 3, 4\}$ basieren.
-Mit diesen Voraussetzungen gilt für die ''relative Entropie'' (englisch: ''Informational Divergence'') zwischen den beiden Wahrscheinlichkeitsfunktionen  $P_X(.)$ und $P_Y(.)$ :
+:$$P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big]
+\hspace{0.05cm}.$$
+This notation already takes into account that the random variables&nbsp; $X$&nbsp; and&nbsp; $Y$&nbsp; are based on the same alphabet&nbsp; $X = \{1,\ 2,\ 3,\ 4\}$.
+With these preconditions, the&nbsp; '''informational divergence'''&nbsp; between the two probability functions&nbsp;  $P_X(.)$&nbsp; and&nbsp; $P_Y(.)$ :
 :$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) =  {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{M}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$
-Man bezeichnet  $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$  als (erste) Kullback–Leibler–Distanz.
+One calls&nbsp;  $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$&nbsp;  the (first)&nbsp; '''Kullback-Leibler distance'''.
-*Diese ist ein Maß für die Ähnlichkeit zwischen den beiden Wahrscheinlichkeitsfunktionen $P_X(.)$ und $P_Y(.)$.
+*This is a measure of the similarity between the two probability mass functions&nbsp; $P_X(.)$&nbsp; and&nbsp; $P_Y(.)$.
-*Die Erwartungswertbildung geschieht hier hinsichtlich der (tatsächlich gleichverteilten) Zufallsgröße $X$.  Dies wird durch die Nomenklatur  $E_X[.]$ angedeutet.
+*The expected value formation occurs here with regard to the&nbsp; (actually equally distributed)&nbsp; random variable&nbsp; $X$.&nbsp; This is indicated by the nomenclature&nbsp;  ${\rm E}_X\big[.\big]$.
-Eine zweite Form der Kullback–Leibler–Distanz ergibt sich durch die Erwartungswertbildung  hinsichtlich der Zufallsgröße $Y \Rightarrow E_Y[.]$:
+A second form of Kullback-Leibler distance results from the formation of expected values with respect to the random variable&nbsp; $Y$ &nbsp; &rArr; &nbsp;  ${\rm E}_Y\big [.\big ]$:
 :$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =  {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$
-''Hinweise:''
-*Die Aufgabe gehört zum  Kapitel [[Informationstheorie/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgrößen|Einige Vorbemerkungen zu den 2D-Zufallsgrößen]].
-*Insbesondere wird Bezug genommen auf die Seite [[Informationstheorie/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgrößen#Relative_Entropie_.E2.80.93_Kullback.E2.80.93Leibler.E2.80.93Distanz|Relative Entropie &ndash; Kullback-Leibler-Distanz]].
-*Die Angaben der Entropie  $H(Y)$ und der Kullback–Leibler–Distanz  $D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)$  in obiger Grafik sind in „bit” zu verstehen.
-* Die in der Grafik  mit „???"  versehenen Felder sollen von Ihnen in dieser Aufgabe ergänzt werden.
-*Sollte die Eingabe des Zahlenwertes &bdquo;0&rdquo; erforderlich sein, so geben Sie bitte &bdquo;0.&rdquo; ein.
-===Fragebogen===
+Hints:
+*The exercise belongs to the chapter&nbsp; [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables|Some preliminary remarks on two-dimensional random variables]].
+*In particular, reference is made to the page&nbsp; [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables#Informational_divergence_-_Kullback-Leibler_distance|Relative entropy &ndash; Kullback-Leibler distance]].
+*The entropy&nbsp;  $H(Y)$&nbsp; and the Kullback-Leibler distance&nbsp;  $D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp;  in the above graph are to be understood in&nbsp; "bit".
+* The fields marked with&nbsp; "???"&nbsp; in the graph are to be completed by you in this task.
+===Questions===
 <quiz display=simple>
-{Welche Entropie besitzt die Zufallsgröße $X$ ?
+{What is the entropy of the random variable&nbsp; $X$ ?
 |type="{}"}
 $H(X)\ = \ $ { 2 1% } $\ \rm bit$
-{Wie groß sind die Entropien der Zufallsgrößen $Y$ (Näherungen für $X$)?
+{What are the entropies of the random variables&nbsp; $Y$&nbsp; $($approximations for&nbsp; $X)$?
 |type="{}"}
-$N=10^3\text{:} \ H(Y) \ = \ $ { 1.9968 1% } $\ \rm bit$
+$N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.9968 1% } $\ \rm bit$
-$N=10^2\text{:} \ H(Y) \ = \ $ { 1.941 1% } $\ \rm bit$
+$N=10^2\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.941 1% } $\ \rm bit$
-$N=10^1\text{:} \ H(Y) \ = \ $ { 1.6855 1%  } $\ \rm bit$
+$N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.6855 1%  } $\ \rm bit$
-{Berechnen Sie die folgenden Kullback–Leibler–Distanzen.
+{Calculate the following Kullback-Leibler distances.
 |type="{}"}
-$N=10^3\text{:} \ D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \ $ { 0.00328 1% } $\ \rm bit$
+$N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \ $ { 0.00328 1% } $\ \rm bit$
-$N=10^2\text{:} \ D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \ $  { 0.0442 1% } $\ \rm bit$
+$N=10^2\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \ $  { 0.0442 1% } $\ \rm bit$
-$N=10^1\text{:} \ D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)  \ = \ $  { 0.345 1% } $\ \rm bit$
+$N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)  \ = \ $  { 0.345 1% } $\ \rm bit$
-{Liefert $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ jeweils exakt das gleiche Ergebnis?
+{Does&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; give exactly the same result in each case?
-|type="[]"}
+|type="()"}
-- Ja.
+- Yes.
-+ Nein.
++ No.
-{Welche Aussagen gelten für die Kullback–Leibler–Distanzen bei $N = 4$?
+{Which statements are true for the Kullback-Leibler distances with&nbsp; $N = 4$?
 |type="[]"}
-- Es gilt $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0$.
+- &nbsp; $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0$&nbsp; is true.
-- Es gilt $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.5 \ \rm  bit$
+- &nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.5 \ \rm  bit$&nbsp; is true.
-+ $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ ist unendlich groß
++ &nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; is infinitely large.
--  Es gilt $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0$.
+- &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0$&nbsp; holds.
-+ Es gilt $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$.
++ &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$&nbsp; holds.
--  $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ ist unendlich groß.
+- &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; is infinitely large.
-{Ändern sich sowohl $H(Y)$ als auch  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ monoton mit $N$?
+{Do both&nbsp; $H(Y)$&nbsp; and&nbsp;  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; change monotonically with&nbsp; $N$?
-|type="[]"}
+|type="()"}
-- Ja,
+- Yes,
-+ Nein.
++ No.
 </quiz>
-===Musterlösung===
+===Solution===
 {{ML-Kopf}}
-'''(1)'''&nbsp; Bei gleichen Wahrscheinlichkeiten gilt mit $M = 4$: &nbsp;  $H(X) = {\rm log}_2 \hspace{0.1cm} M
+'''(1)'''&nbsp; With equal probabilities, and with&nbsp; $M = 4$:
-\hspace{0.15cm} \underline {= 2\,{\rm (bit)}}  \hspace{0.05cm}.$
+:$$H(X) = {\rm log}_2 \hspace{0.1cm} M
+\hspace{0.15cm} \underline {= 2\,{\rm (bit)}}  \hspace{0.05cm}.$$
-'''(2)'''&nbsp; Die Wahrscheinlichkeiten für die empirisch ermittelten Zufallsgrößen $Y$ weichen im Allgemeinen (nicht immer!) von der Gleichverteilung um so mehr ab, je kleiner der Parameter $N$ ist. Man erhält
+'''(2)'''&nbsp; The probabilities for the empirically determined random variables&nbsp; $Y$&nbsp; generally&nbsp; (not always!)&nbsp; deviate from the uniform distribution the more the parameter&nbsp; $N$&nbsp; is smaller.&nbsp; One obtains for
-* $N = 1000 \Rightarrow  P_Y(Y) =  [0.225, 0.253, 0.250, 0.272]$:
+* $N = 1000 \ \ \Rightarrow \ \ P_Y(Y) =  \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]$:
-:$$H(Y) \hspace{-0.15cm} = \hspace{-0.15cm}
+:$$H(Y) =
 .225 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.225} +
 .253 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.253} +
 .250 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.250} +
 .272 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.272}
-\hspace{0.15cm} \underline {= 1.9968\,{\rm (bit)}}  \hspace{0.05cm},$$
+\hspace{0.15cm} \underline {= 1.9968\ {\rm (bit)}}  \hspace{0.05cm},$$
-* $N = 100\Rightarrow  P_Y(Y) = [0.24, 0.16, 0.30, 0.30]$:
+* $N = 100 \ \ \Rightarrow \ \  P_Y(Y) = \big[0.24, \ 0.16, \ 0.30,  \ 0.30\big]$:
-:$$H(Y) = ... \hspace{0.15cm} \underline {= 1.9410\,{\rm (bit)}}  \hspace{0.05cm},$$
+:$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.9410\ {\rm (bit)}}  \hspace{0.05cm},$$
-* $N = 10 \Rightarrow  P_Y(Y) =  [0.5, 0.1, 0.3, 0.1]$:
+* $N = 10 \ \ \Rightarrow \ \  P_Y(Y) =  \big[0.5, \ 0.1, \ 0.3, \ 0.1 \big]$:
-:$$H(Y) = ... \hspace{0.15cm} \underline {= 1.6855\,{\rm (bit)}}  \hspace{0.05cm}.$$
+:$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.6855\ {\rm (bit)}}  \hspace{0.05cm}.$$
-'''(3)'''&nbsp; Die Gleichung für die gesuchte Kullback–Leibler–Distanz lautet:
+'''(3)'''&nbsp; The equation for the Kullback-Leibler distance we are looking for is:
 :$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)}
@@ Line 114: / Line 123: @@
 \right ] \hspace{0.05cm}.$$
-Der Logarithmus zur Basis 2 &nbsp; &rArr;  &nbsp; $\log_2(.)$ wurde zur einfachen Nutzung des Taschenrechners durch den Zehnerlogarithmus  2 &nbsp; &rArr;  &nbsp; $\lg(.)$  ersetzt. Man erhält die folgenden numerischen Ergebnisse:
+The logarithm to the base&nbsp; $ 2$&nbsp; &rArr;  &nbsp; $\log_2(.)$&nbsp; was replaced by the logarithm to the base&nbsp; $ 10$ &nbsp; &rArr;  &nbsp; $\lg(.)$ for easy use of the calculator.
-* für $N=1000$:
+The following numerical results are obtained:
+* for $N=1000$:
 :$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
 \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272}
-\right ] \hspace{0.15cm} \underline {= 3.28 \cdot 10^{-3}\,{\rm (bit)}}  \hspace{0.05cm},$$
+\right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}}  \hspace{0.05cm},$$
-* für $N=100$:
+* for $N=100$:
 :$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
 \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30}
-\right ] \hspace{0.15cm} \underline {= 4.42 \cdot 10^{-2}\,{\rm (bit)}}  \hspace{0.05cm},$$
+\right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}}  \hspace{0.05cm},$$
-* für $N=10$:
+* for $N=10$:
 :$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
 \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1}
-\right ] \hspace{0.15cm} \underline {= 3.45 \cdot 10^{-1}\,{\rm (bit)}}  \hspace{0.05cm}.$$
+\right ] \hspace{0.15cm} \underline {= 0.345 \,{\rm (bit)}}  \hspace{0.05cm}.$$
-'''(4)'''&nbsp; Richtig ist <u>Nein</u>, wie am Beispiel $N = 100$ gezeigt werden soll:
-:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =   \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25}  = 0.0407\,{\rm (bit)}\hspace{0.05cm}.$$
-In der Teilaufgabe (c) haben wir stattdessen $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$ erhalten. Das bedeutet auch: Der Name „Distanz” ist etwas irreführend. Danach würde man eigentlich $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ = $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ erwarten.
+'''(4)'''&nbsp; Correct is&nbsp; <u>'''No'''</u>,&nbsp; as will be shown by the example&nbsp; $N = 100$&nbsp;:
+:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =   \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25}  = 0.0407\ {\rm (bit)}\hspace{0.05cm}.$$
+*In subtask&nbsp; '''(3)'''&nbsp; we got&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$&nbsp; instead.
+*This also means: &nbsp; The designation „distance” is somewhat misleading.
+*According to this, one would actually expect&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;.
-'''(5)'''&nbsp; Mit $P_Y(X) = [0, 0.25, 0.5, 0.25]$ erhält man:
+[[File:P_ID2763__Inf_Z_3_4e.png|right|frame|Probability function, entropy and Kullback-Leibler distance]]
+'''(5)'''&nbsp; With&nbsp; $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$&nbsp; one obtains:
 :$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.$$
-Aufgrund des ersten Terms ergibt sich für $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ ein unendlich großer Wert. Für die zweite Kullback–Leibler–Distanz gilt:
+*Because of the first term, the value of&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; is infinitely large.
+*For the second Kullback-Leibler distance holds:
 :$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+
 .50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25}
 	\hspace{0.05cm}.$$
-[[File:P_ID2763__Inf_Z_3_4e.png|right|Wahrscheinlichkeitsfunktion, Entropie und Kullback–Leibler–Distanz]]
+*After looking at the limits, one can see that the first term yields the result&nbsp; $0$&nbsp;.&nbsp; The second term also yields zero, and one obtains as the final result:
-Nach einer Grenzwertbetrachtung erkennt man, dass der erste Term das Ergebnis $0$ liefert. Auch der zweite Term ergibt sich zu $0$, und man erhält als Endergebnis:
 :$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} 	\hspace{0.05cm}.$$
-Richtig sind somit die <u>Aussagen 3 und 5</u>:
+&nbsp; <u>Statements 3 and 5</u> are therefore correct:
-*Auch aus diesem Extrembeispiel wird deutlich, dass sich $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ stets von $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ unterscheidet.
+*From this extreme example it is clear that&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; is always different from&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;.
-*Nur für den Sonderfall $P_Y = P_X$ sind beide Kullback–Leibler–Distanzen gleich, nämlich Null.
+*Only for the special case&nbsp; $P_Y \equiv P_X$&nbsp; are both Kullback-Leibler distances equal, namely zero.
-*Die nebenstehende Tabelle zeigt das vollständige Ergebnis dieser Aufgabe.
+*The adjacent table shows the complete result of this task.
-'''(6)'''&nbsp; Richtig ist <u>Nein</u>. Die Tendenz ist zwar eindeutig: Je größer $N$ ist,
-* desto mehr nähert sich $H(Y)$ im Prinzip dem Endwert $H(X) = 2 \ \rm bit$ an.
-* um so kleiner werden die Distanzen $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ und $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$.
-Man erkennt aus der Tabelle aber auch, dass es Ausnahmen gibt:
-* Die Entropie $H(Y)$ ist für $N = 1000$ kleiner als für $N = 400$,
-* Die Distanz $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ ist für $N = 1000$ größer als für $N = 400$.
+'''(6)'''&nbsp; Correct is again&nbsp; <u>'''No'''</u>. &nbsp; Although the tendency is clear: &nbsp; The larger&nbsp; $N$&nbsp; is,
+* the more&nbsp; $H(Y)$&nbsp; approaches in principle the final value&nbsp; $H(X) = 2 \ \rm bit$,
+* the smaller the distances&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; and&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ become.
-Der Grund hierfür ist, dass das hier dokumentierte empirische Experiment mit $N = 400$ eher zu einer Gleichverteilung geführt hat als das Experiment mit $N = 1000$.
-Würde man dagegen sehr (unendlich) viele Versuche mit $N = 400$ und $N = 1000$ starten und über diese mitteln, ergäbe sich tatsächlich der eigentlich erwartete monotone Verlauf.
+However, one can also see from the table that there are exceptions:
+* The entropy&nbsp; $H(Y)$&nbsp; is smaller for&nbsp; $N = 1000$&nbsp; than for&nbsp; $N = 400$.
+* The distance&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; is greater for&nbsp; $N = 1000$&nbsp; than for&nbsp; $N = 400$.
+* The reason for this is that the experiment documented here with&nbsp; $N = 400$&nbsp; was more likely to lead to a uniform distribution than the experiment with&nbsp; $N = 1000$.
+*If, on the other hand, one were to start a very (infinitely) large number of experiments with&nbsp; $N = 400$&nbsp; and&nbsp; $N = 1000$&nbsp; and average over all of them, the actually expected monotonic course would actually result.
 {{ML-Fuß}}
@@ Line 171: / Line 186: @@
-[[Category:Aufgaben zu Informationstheorie|^3.1 Vorbemerkungen zu 2D-Zufallsgrößen^]]
+[[Category:Information Theory: Exercises|^3.1 General Information on 2D Random Variables^]]

Latest revision as of 10:14, 24 September 2021

Return to book

Determined probability mass functions

The probability mass function is:

$$P_X(X) = \big[\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.15cm},\hspace{0.15cm} 0.25 \hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.03cm}\big]\hspace{0.05cm}.$$

The random variable $X$ is thus characterised by

the symbol set size $M=4$,
equal probabilities $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .

The random variable $Y$ is always an approximation for $X$:

It was obtained by simulation from a uniform distribution, whereby only $N$ random numbers were evaluated in each case. This means:
$P_Y(1)$, ... , $P_Y(4)$ are not probabilities in the conventional sense. Rather, they describe relative frequencies.

The result of the sixth test series (with $N=1000)$ is thus summarised by the following probability function:

$$P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big] \hspace{0.05cm}.$$

This notation already takes into account that the random variables $X$ and $Y$ are based on the same alphabet $X = \{1,\ 2,\ 3,\ 4\}$.

With these preconditions, the informational divergence between the two probability functions $P_X(.)$ and $P_Y(.)$ :

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{M} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$

One calls $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$ the (first) Kullback-Leibler distance.

This is a measure of the similarity between the two probability mass functions $P_X(.)$ and $P_Y(.)$.
The expected value formation occurs here with regard to the (actually equally distributed) random variable $X$. This is indicated by the nomenclature ${\rm E}_X\big[.\big]$.

A second form of Kullback-Leibler distance results from the formation of expected values with respect to the random variable $Y$ ⇒ ${\rm E}_Y\big [.\big ]$:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$

Hints:

The exercise belongs to the chapter Some preliminary remarks on two-dimensional random variables.
In particular, reference is made to the page Relative entropy – Kullback-Leibler distance.
The entropy $H(Y)$ and the Kullback-Leibler distance $D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ in the above graph are to be understood in "bit".
The fields marked with "???" in the graph are to be completed by you in this task.

Questions

What is the entropy of the random variable $X$ ?

$H(X)\ = \ $

$\ \rm bit$

What are the entropies of the random variables $Y$ $($approximations for $X)$?

$N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

$N=10^2\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

$N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

Calculate the following Kullback-Leibler distances.

$N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

$N=10^2\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

$N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

Does $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ give exactly the same result in each case?

	Yes.
	No.

Which statements are true for the Kullback-Leibler distances with $N = 4$?

	$D(P_X \hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0.5 \ \rm bit$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y)$ is infinitely large.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X)$ is infinitely large.

Do both $H(Y)$ and $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ change monotonically with $N$?

	Yes,
	No.

Solution

(1) With equal probabilities, and with $M = 4$:

$$H(X) = {\rm log}_2 \hspace{0.1cm} M \hspace{0.15cm} \underline {= 2\,{\rm (bit)}} \hspace{0.05cm}.$$

(2) The probabilities for the empirically determined random variables $Y$ generally (not always!) deviate from the uniform distribution the more the parameter $N$ is smaller. One obtains for

$N = 1000 \ \ \Rightarrow \ \ P_Y(Y) = \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]$:

$$H(Y) = 0.225 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.225} + 0.253 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.253} + 0.250 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.250} + 0.272 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.272} \hspace{0.15cm} \underline {= 1.9968\ {\rm (bit)}} \hspace{0.05cm},$$

$N = 100 \ \ \Rightarrow \ \ P_Y(Y) = \big[0.24, \ 0.16, \ 0.30, \ 0.30\big]$:

$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.9410\ {\rm (bit)}} \hspace{0.05cm},$$

$N = 10 \ \ \Rightarrow \ \ P_Y(Y) = \big[0.5, \ 0.1, \ 0.3, \ 0.1 \big]$:

$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.6855\ {\rm (bit)}} \hspace{0.05cm}.$$

(3) The equation for the Kullback-Leibler distance we are looking for is:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} = \frac{1/4}{{\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25}{P_Y(1)} + \frac{0.25}{P_Y(2)} + \frac{0.25}{P_Y(3)} + \frac{0.25}{P_Y(4)} \right ] $$

$$\Rightarrow \hspace{0.3cm} D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{P_Y(1) \cdot P_Y(2)\cdot P_Y(3)\cdot P_Y(4)} \right ] \hspace{0.05cm}.$$

The logarithm to the base $ 2$ ⇒ $\log_2(.)$ was replaced by the logarithm to the base $ 10$ ⇒ $\lg(.)$ for easy use of the calculator.

The following numerical results are obtained:

for $N=1000$:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272} \right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}} \hspace{0.05cm},$$

for $N=100$:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30} \right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}} \hspace{0.05cm},$$

for $N=10$:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1} \right ] \hspace{0.15cm} \underline {= 0.345 \,{\rm (bit)}} \hspace{0.05cm}.$$

(4) Correct is No, as will be shown by the example $N = 100$ :

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25} = 0.0407\ {\rm (bit)}\hspace{0.05cm}.$$

In subtask (3) we got $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$ instead.
This also means: The designation „distance” is somewhat misleading.
According to this, one would actually expect $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ .

Probability function, entropy and Kullback-Leibler distance

(5) With $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$ one obtains:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.$$

Because of the first term, the value of $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ is infinitely large.
For the second Kullback-Leibler distance holds:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+ 0.50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25} \hspace{0.05cm}.$$

After looking at the limits, one can see that the first term yields the result $0$ . The second term also yields zero, and one obtains as the final result:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} \hspace{0.05cm}.$$

Statements 3 and 5 are therefore correct:

From this extreme example it is clear that $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ is always different from $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ .
Only for the special case $P_Y \equiv P_X$ are both Kullback-Leibler distances equal, namely zero.
The adjacent table shows the complete result of this task.

(6) Correct is again No. Although the tendency is clear: The larger $N$ is,

the more $H(Y)$ approaches in principle the final value $H(X) = 2 \ \rm bit$,
the smaller the distances $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ and $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ become.

However, one can also see from the table that there are exceptions:

The entropy $H(Y)$ is smaller for $N = 1000$ than for $N = 400$.
The distance $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ is greater for $N = 1000$ than for $N = 400$.
The reason for this is that the experiment documented here with $N = 400$ was more likely to lead to a uniform distribution than the experiment with $N = 1000$.
If, on the other hand, one were to start a very (infinitely) large number of experiments with $N = 400$ and $N = 1000$ and average over all of them, the actually expected monotonic course would actually result.

Retrieved from "http://en.lntwww.de/index.php?title=Aufgaben:Exercise_3.5Z:_Kullback-Leibler_Distance_again&oldid=41387"

Category:

Information Theory: Exercises

	$D(P_X \hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0.5 \ \rm bit$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y)$ is infinitely large.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X)$ is infinitely large.