Difference between revisions of "Aufgaben:Exercise 3.5Z: Kullback-Leibler Distance again"

From LNTwww

@@ Line 1: / Line 1: @@
-{{quiz-Header|Buchseite=Informationstheorie/Einige Vorbemerkungen zu zweidimensionalen Zufallsgrößen
+{{quiz-Header|Buchseite=Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables
 }}
-[[File:P_ID2762__Inf_Z_3_4.png|right|frame|Ermittelte Wahrscheinlichkeitsfunktionen]]
+[[File:P_ID2762__Inf_Z_3_4.png|right|frame|Determined probability mass functions]]
-Die Wahrscheinlichkeitsfunktion lautet:
+The probability mass function is:
 :$$P_X(X) = \big[\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.15cm},\hspace{0.15cm} 0.25 \hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.03cm}\big]\hspace{0.05cm}.$$
-Die Zufallsgröße&nbsp; $X$&nbsp; ist also gekennzeichnet durch
+The random variable&nbsp; $X$&nbsp; is thus characterised by
-* den Symbolumfang&nbsp; $M=4$,
+* the symbol set size&nbsp; $M=4$,
-* gleiche Wahrscheinlichkeiten $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .
+* equal probabilities&nbsp; $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .
-Die Zufallsgröße&nbsp; $Y$&nbsp; ist stets eine Näherung für&nbsp; $X$:
+The random variable&nbsp; $Y$&nbsp; is always an approximation for&nbsp; $X$:
-*Sie wurde per Simulation aus einer Gleichverteilung gewonnen, wobei jeweils nur&nbsp; $N$&nbsp; Zufallszahlen ausgewertet wurden.
+*It was obtained by simulation from a uniform distribution, whereby only&nbsp; $N$&nbsp; random numbers were evaluated in each case.&nbsp; This means: &nbsp;
-*Das heißt: &nbsp; $P_Y(1)$, ... , $P_Y(4)$&nbsp; sind im herkömmlichen Sinn keine Wahrscheinlichkeiten.&nbsp; Sie beschreiben vielmehr&nbsp; [[Theory_of_Stochastic_Signals/Wahrscheinlichkeit_und_relative_H%C3%A4ufigkeit#Bernoullisches_Gesetz_der_gro.C3.9Fen_Zahlen| relative Häufigkeiten]].
+*$P_Y(1)$, ... , $P_Y(4)$&nbsp; are not probabilities in the conventional sense.&nbsp; Rather, they describe&nbsp; [[Theory_of_Stochastic_Signals/Wahrscheinlichkeit_und_relative_H%C3%A4ufigkeit#Bernoullisches_Gesetz_der_gro.C3.9Fen_Zahlen| relative frequencies]].
-Das Ergebnis der sechsten Versuchsreihe&nbsp; (mit&nbsp;  $N=1000)$&nbsp; wird demnach durch die folgende Wahrscheinlichkeitsfunktion zusammengefasst:
+The result of the sixth test series&nbsp; (with&nbsp;  $N=1000)$&nbsp; is thus summarised by the following probability function:
 :$$P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big]
 \hspace{0.05cm}.$$
-Bei dieser Schreibweise ist bereits berücksichtigt, dass die Zufallsgrößen&nbsp; $X$&nbsp; und&nbsp; $Y$&nbsp; auf dem gleichen Alphabet&nbsp; $X = \{1,\ 2,\ 3,\ 4\}$ basieren.
+This notation already takes into account that the random variables&nbsp; $X$&nbsp; and&nbsp; $Y$&nbsp; are based on the same alphabet&nbsp; $X = \{1,\ 2,\ 3,\ 4\}$.
-Mit diesen Voraussetzungen gilt für die&nbsp; ''relative Entropie''&nbsp; (englisch:&nbsp; ''Informational Divergence'') zwischen den beiden Wahrscheinlichkeitsfunktionen&nbsp;  $P_X(.)$&nbsp; und&nbsp; $P_Y(.)$ :
+With these preconditions, the&nbsp; '''informational divergence'''&nbsp; between the two probability functions&nbsp;  $P_X(.)$&nbsp; and&nbsp; $P_Y(.)$ :
 :$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) =  {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{M}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$
-Man bezeichnet&nbsp;  $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$&nbsp;  als (erste) Kullback–Leibler–Distanz.
+One calls&nbsp;  $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$&nbsp;  the (first)&nbsp; '''Kullback-Leibler distance'''.
-*Diese ist ein Maß für die Ähnlichkeit zwischen den zwei Wahrscheinlichkeitsfunktionen&nbsp; $P_X(.)$&nbsp; und&nbsp; $P_Y(.)$.
+*This is a measure of the similarity between the two probability mass functions&nbsp; $P_X(.)$&nbsp; and&nbsp; $P_Y(.)$.
-*Die Erwartungswertbildung geschieht hier hinsichtlich der (tatsächlich gleichverteilten) Zufallsgröße&nbsp; $X$.&nbsp; Dies wird durch die Nomenklatur&nbsp;  ${\rm E}_X\big[.\big]$&nbsp; angedeutet.
+*The expected value formation occurs here with regard to the&nbsp; (actually equally distributed)&nbsp; random variable&nbsp; $X$.&nbsp; This is indicated by the nomenclature&nbsp;  ${\rm E}_X\big[.\big]$.
-Eine zweite Form der Kullback–Leibler–Distanz ergibt sich durch die Erwartungswertbildung  hinsichtlich der Zufallsgröße&nbsp; $Y$ &nbsp; &rArr; &nbsp;  ${\rm E}_Y\big [.\big ]$:
+A second form of Kullback-Leibler distance results from the formation of expected values with respect to the random variable&nbsp; $Y$ &nbsp; &rArr; &nbsp;  ${\rm E}_Y\big [.\big ]$:
 :$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =  {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$
@@ Line 39: / Line 39: @@
+Hints:
+*The exercise belongs to the chapter&nbsp; [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables|Some preliminary remarks on two-dimensional random variables]].
+*In particular, reference is made to the page&nbsp; [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables#Informational_divergence_-_Kullback-Leibler_distance|Relative entropy &ndash; Kullback-Leibler distance]].
-''Hinweise:''
+*The entropy&nbsp;  $H(Y)$&nbsp; and the Kullback-Leibler distance&nbsp;  $D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp;  in the above graph are to be understood in&nbsp; "bit".
-*Die Aufgabe gehört zum  Kapitel&nbsp; [[Information_Theory/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgrößen|Einige Vorbemerkungen zu den 2D-Zufallsgrößen]].
+* The fields marked with&nbsp; "???"&nbsp; in the graph are to be completed by you in this task.
-*Insbesondere wird Bezug genommen auf die Seite&nbsp; [[Information_Theory/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgrößen#Relative_Entropie_.E2.80.93_Kullback.E2.80.93Leibler.E2.80.93Distanz|Relative Entropie &ndash; Kullback-Leibler-Distanz]].
-*Die Angaben der Entropie&nbsp;  $H(Y)$&nbsp; und der Kullback–Leibler–Distanz&nbsp;  $D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp;  in obiger Grafik sind in „bit” zu verstehen.
-* Die in der Grafik  mit&nbsp; „???"&nbsp;  versehenen Felder sollen von Ihnen in dieser Aufgabe ergänzt werden.
-===Fragebogen===
+===Questions===
 <quiz display=simple>
-{Welche Entropie besitzt die Zufallsgröße&nbsp; $X$ ?
+{What is the entropy of the random variable&nbsp; $X$ ?
 |type="{}"}
 $H(X)\ = \ $ { 2 1% } $\ \rm bit$
-{Wie groß sind die Entropien der Zufallsgrößen&nbsp; $Y$&nbsp; $($Näherungen für&nbsp; $X)$?
+{What are the entropies of the random variables&nbsp; $Y$&nbsp; $($approximations for&nbsp; $X)$?
 |type="{}"}
 $N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.9968 1% } $\ \rm bit$
@@ Line 64: / Line 61: @@
 $N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.6855 1%  } $\ \rm bit$
-{Berechnen Sie die folgenden Kullback–Leibler–Distanzen.
+{Calculate the following Kullback-Leibler distances.
 |type="{}"}
 $N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \ $ { 0.00328 1% } $\ \rm bit$
@@ Line 70: / Line 67: @@
 $N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)  \ = \ $  { 0.345 1% } $\ \rm bit$
-{Liefert&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; jeweils exakt das gleiche Ergebnis?
+{Does&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; give exactly the same result in each case?
 |type="()"}
-- Ja.
+- Yes.
-+ Nein.
++ No.
-{Welche Aussagen gelten für die Kullback–Leibler–Distanzen bei&nbsp; $N = 4$?
+{Which statements are true for the Kullback-Leibler distances with&nbsp; $N = 4$?
 |type="[]"}
-- Es gilt&nbsp; $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0$.
+- &nbsp; $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0$&nbsp; is true.
-- Es gilt&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.5 \ \rm  bit$.
+- &nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.5 \ \rm  bit$&nbsp; is true.
-+ $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; ist unendlich groß.
++ &nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; is infinitely large.
--  Es gilt&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0$.
+- &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0$&nbsp; holds.
-+ Es gilt&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$.
++ &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$&nbsp; holds.
--  $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; ist unendlich groß.
+- &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; is infinitely large.
-{Ändern sich sowohl&nbsp; $H(Y)$&nbsp; als auch&nbsp;  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; monoton mit&nbsp; $N$?
+{Do both&nbsp; $H(Y)$&nbsp; and&nbsp;  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; change monotonically with&nbsp; $N$?
 |type="()"}
-- Ja,
+- Yes,
-+ Nein.
++ No.
 </quiz>
-===Musterlösung===
+===Solution===
 {{ML-Kopf}}
-'''(1)'''&nbsp; Bei gleichen Wahrscheinlichkeiten gilt mit&nbsp; $M = 4$:
+'''(1)'''&nbsp; With equal probabilities, and with&nbsp; $M = 4$:
 :$$H(X) = {\rm log}_2 \hspace{0.1cm} M
 \hspace{0.15cm} \underline {= 2\,{\rm (bit)}}  \hspace{0.05cm}.$$
@@ Line 101: / Line 98: @@
-'''(2)'''&nbsp; Die Wahrscheinlichkeiten für die empirisch ermittelten Zufallsgrößen&nbsp; $Y$&nbsp; weichen im Allgemeinen&nbsp; (nicht immer!)&nbsp; von der Gleichverteilung um so mehr ab, je kleiner der Parameter&nbsp; $N$&nbsp; ist.&nbsp;
+'''(2)'''&nbsp; The probabilities for the empirically determined random variables&nbsp; $Y$&nbsp; generally&nbsp; (not always!)&nbsp; deviate from the uniform distribution the more the parameter&nbsp; $N$&nbsp; is smaller.&nbsp; One obtains for
-Man erhält für
 * $N = 1000 \ \ \Rightarrow \ \ P_Y(Y) =  \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]$:
 :$$H(Y) =
@@ Line 118: / Line 113: @@
-'''(3)'''&nbsp; Die Gleichung für die gesuchte Kullback–Leibler–Distanz lautet:
+'''(3)'''&nbsp; The equation for the Kullback-Leibler distance we are looking for is:
 :$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)}
@@ Line 128: / Line 123: @@
 \right ] \hspace{0.05cm}.$$
-Der Logarithmus zur Basis&nbsp; $ 2$&nbsp; &rArr;  &nbsp; $\log_2(.)$ wurde zur einfachen Nutzung des Taschenrechners durch den Zehnerlogarithmus &nbsp; &rArr;  &nbsp; $\lg(.)$  ersetzt.
+The logarithm to the base&nbsp; $ 2$&nbsp; &rArr;  &nbsp; $\log_2(.)$&nbsp; was replaced by the logarithm to the base&nbsp; $ 10$ &nbsp; &rArr;  &nbsp; $\lg(.)$ for easy use of the calculator.
-Man erhält die folgenden numerischen Ergebnisse:
+The following numerical results are obtained:
-* für $N=1000$:
+* for $N=1000$:
 :$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
 \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272}
 \right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}}  \hspace{0.05cm},$$
-* für $N=100$:
+* for $N=100$:
 :$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
 \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30}
 \right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}}  \hspace{0.05cm},$$
-* für $N=10$:
+* for $N=10$:
 :$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
 \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1}
@@ Line 146: / Line 141: @@
-'''(4)'''&nbsp; Richtig ist&nbsp; <u>'''Nein'''</u>, wie am Beispiel&nbsp; $N = 100$&nbsp; gezeigt werden soll:
+'''(4)'''&nbsp; Correct is&nbsp; <u>'''No'''</u>,&nbsp; as will be shown by the example&nbsp; $N = 100$&nbsp;:
 :$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =   \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25}  = 0.0407\ {\rm (bit)}\hspace{0.05cm}.$$
-*In der Teilaufgabe&nbsp; '''(3)'''&nbsp; haben wir stattdessen&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$&nbsp; erhalten.
+*In subtask&nbsp; '''(3)'''&nbsp; we got&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$&nbsp; instead.
-*Das bedeutet auch: &nbsp; Die Bezeichnung „Distanz” ist etwas irreführend.
+*This also means: &nbsp; The designation „distance” is somewhat misleading.
-*Danach würde man eigentlich&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ = $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; erwarten.
+*According to this, one would actually expect&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;.
-[[File:P_ID2763__Inf_Z_3_4e.png|right|frame|Wahrscheinlichkeitsfunktion, Entropie und Kullback–Leibler–Distanz]]
+[[File:P_ID2763__Inf_Z_3_4e.png|right|frame|Probability function, entropy and Kullback-Leibler distance]]
-'''(5)'''&nbsp; Mit&nbsp; $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$&nbsp; erhält man:
+'''(5)'''&nbsp; With&nbsp; $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$&nbsp; one obtains:
 :$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.$$
-*Aufgrund des ersten Terms ergibt sich für&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; ein unendlich großer Wert.
+*Because of the first term, the value of&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; is infinitely large.
-*Für die zweite Kullback–Leibler–Distanz gilt:
+*For the second Kullback-Leibler distance holds:
 :$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+
 .50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25}
 	\hspace{0.05cm}.$$
-*Nach einer Grenzwertbetrachtung erkennt man, dass der erste Term das Ergebnis&nbsp; $0$&nbsp; liefert.&nbsp; Auch der zweite Term ergibt sich zu Null, und man erhält als Endergebnis:
+*After looking at the limits, one can see that the first term yields the result&nbsp; $0$&nbsp;.&nbsp; The second term also yields zero, and one obtains as the final result:
 :$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} 	\hspace{0.05cm}.$$
-Richtig sind somit die&nbsp; <u>Aussagen 3 und 5</u>:
+&nbsp; <u>Statements 3 and 5</u> are therefore correct:
-*Aus diesem Extrembeispiel wird deutlich, dass sich&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; stets von&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; unterscheidet.
+*From this extreme example it is clear that&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; is always different from&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;.
-*Nur für den Sonderfall&nbsp; $P_Y \equiv P_X$&nbsp; sind beide Kullback–Leibler–Distanzen gleich, nämlich Null.
+*Only for the special case&nbsp; $P_Y \equiv P_X$&nbsp; are both Kullback-Leibler distances equal, namely zero.
-*Die nebenstehende Tabelle zeigt das vollständige Ergebnis dieser Aufgabe.
+*The adjacent table shows the complete result of this task.
-'''(6)'''&nbsp; Richtig ist wiederum&nbsp; <u>'''Nein'''</u>.&nbsp; Die Tendenz ist zwar eindeutig: &nbsp; Je größer&nbsp; $N$&nbsp; ist,
+'''(6)'''&nbsp; Correct is again&nbsp; <u>'''No'''</u>. &nbsp; Although the tendency is clear: &nbsp; The larger&nbsp; $N$&nbsp; is,
-* desto mehr nähert sich&nbsp; $H(Y)$&nbsp; im Prinzip dem Endwert&nbsp; $H(X) = 2 \ \rm bit$&nbsp; an.
+* the more&nbsp; $H(Y)$&nbsp; approaches in principle the final value&nbsp; $H(X) = 2 \ \rm bit$,
-* um so kleiner werden die Distanzen&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; und&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$.
+* the smaller the distances&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; and&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ become.
-Man erkennt aus der Tabelle aber auch, dass es Ausnahmen gibt:
+However, one can also see from the table that there are exceptions:
-* Die Entropie&nbsp; $H(Y)$&nbsp; ist für&nbsp; $N = 1000$&nbsp; kleiner als für&nbsp; $N = 400$.
+* The entropy&nbsp; $H(Y)$&nbsp; is smaller for&nbsp; $N = 1000$&nbsp; than for&nbsp; $N = 400$.
-* Die Distanz&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; ist für&nbsp; $N = 1000$&nbsp; größer als für&nbsp; $N = 400$.
+* The distance&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; is greater for&nbsp; $N = 1000$&nbsp; than for&nbsp; $N = 400$.
-*Der Grund hierfür ist, dass das hier dokumentierte empirische Experiment mit&nbsp; $N = 400$&nbsp; eher zu einer Gleichverteilung geführt hat als das Experiment mit&nbsp; $N = 1000$.
+* The reason for this is that the experiment documented here with&nbsp; $N = 400$&nbsp; was more likely to lead to a uniform distribution than the experiment with&nbsp; $N = 1000$.
-*Würde man dagegen sehr (unendlich) viele Versuche mit&nbsp; $N = 400$&nbsp; und&nbsp; $N = 1000$&nbsp; starten und über all diese mitteln, ergäbe sich tatsächlich der eigentlich erwartete monotone Verlauf.
+*If, on the other hand, one were to start a very (infinitely) large number of experiments with&nbsp; $N = 400$&nbsp; and&nbsp; $N = 1000$&nbsp; and average over all of them, the actually expected monotonic course would actually result.
 {{ML-Fuß}}
@@ Line 192: / Line 186: @@
-[[Category:Information Theory: Exercises|^3.1 Allgemeines zu 2D-Zufallsgrößen^]]
+[[Category:Information Theory: Exercises|^3.1 General Information on 2D Random Variables^]]

Latest revision as of 10:14, 24 September 2021

Return to book

Determined probability mass functions

The probability mass function is:

$$P_X(X) = \big[\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.15cm},\hspace{0.15cm} 0.25 \hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.03cm}\big]\hspace{0.05cm}.$$

The random variable $X$ is thus characterised by

the symbol set size $M=4$,
equal probabilities $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .

The random variable $Y$ is always an approximation for $X$:

It was obtained by simulation from a uniform distribution, whereby only $N$ random numbers were evaluated in each case. This means:
$P_Y(1)$, ... , $P_Y(4)$ are not probabilities in the conventional sense. Rather, they describe relative frequencies.

The result of the sixth test series (with $N=1000)$ is thus summarised by the following probability function:

$$P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big] \hspace{0.05cm}.$$

This notation already takes into account that the random variables $X$ and $Y$ are based on the same alphabet $X = \{1,\ 2,\ 3,\ 4\}$.

With these preconditions, the informational divergence between the two probability functions $P_X(.)$ and $P_Y(.)$ :

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{M} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$

One calls $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$ the (first) Kullback-Leibler distance.

This is a measure of the similarity between the two probability mass functions $P_X(.)$ and $P_Y(.)$.
The expected value formation occurs here with regard to the (actually equally distributed) random variable $X$. This is indicated by the nomenclature ${\rm E}_X\big[.\big]$.

A second form of Kullback-Leibler distance results from the formation of expected values with respect to the random variable $Y$ ⇒ ${\rm E}_Y\big [.\big ]$:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$

Hints:

The exercise belongs to the chapter Some preliminary remarks on two-dimensional random variables.
In particular, reference is made to the page Relative entropy – Kullback-Leibler distance.
The entropy $H(Y)$ and the Kullback-Leibler distance $D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ in the above graph are to be understood in "bit".
The fields marked with "???" in the graph are to be completed by you in this task.

Questions

What is the entropy of the random variable $X$ ?

$H(X)\ = \ $

$\ \rm bit$

What are the entropies of the random variables $Y$ $($approximations for $X)$?

$N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

$N=10^2\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

$N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

Calculate the following Kullback-Leibler distances.

$N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

$N=10^2\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

$N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

Does $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ give exactly the same result in each case?

	Yes.
	No.

Which statements are true for the Kullback-Leibler distances with $N = 4$?

	$D(P_X \hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0.5 \ \rm bit$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y)$ is infinitely large.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X)$ is infinitely large.

Do both $H(Y)$ and $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ change monotonically with $N$?

	Yes,
	No.

Solution

(1) With equal probabilities, and with $M = 4$:

$$H(X) = {\rm log}_2 \hspace{0.1cm} M \hspace{0.15cm} \underline {= 2\,{\rm (bit)}} \hspace{0.05cm}.$$

(2) The probabilities for the empirically determined random variables $Y$ generally (not always!) deviate from the uniform distribution the more the parameter $N$ is smaller. One obtains for

$N = 1000 \ \ \Rightarrow \ \ P_Y(Y) = \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]$:

$$H(Y) = 0.225 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.225} + 0.253 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.253} + 0.250 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.250} + 0.272 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.272} \hspace{0.15cm} \underline {= 1.9968\ {\rm (bit)}} \hspace{0.05cm},$$

$N = 100 \ \ \Rightarrow \ \ P_Y(Y) = \big[0.24, \ 0.16, \ 0.30, \ 0.30\big]$:

$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.9410\ {\rm (bit)}} \hspace{0.05cm},$$

$N = 10 \ \ \Rightarrow \ \ P_Y(Y) = \big[0.5, \ 0.1, \ 0.3, \ 0.1 \big]$:

$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.6855\ {\rm (bit)}} \hspace{0.05cm}.$$

(3) The equation for the Kullback-Leibler distance we are looking for is:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} = \frac{1/4}{{\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25}{P_Y(1)} + \frac{0.25}{P_Y(2)} + \frac{0.25}{P_Y(3)} + \frac{0.25}{P_Y(4)} \right ] $$

$$\Rightarrow \hspace{0.3cm} D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{P_Y(1) \cdot P_Y(2)\cdot P_Y(3)\cdot P_Y(4)} \right ] \hspace{0.05cm}.$$

The logarithm to the base $ 2$ ⇒ $\log_2(.)$ was replaced by the logarithm to the base $ 10$ ⇒ $\lg(.)$ for easy use of the calculator.

The following numerical results are obtained:

for $N=1000$:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272} \right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}} \hspace{0.05cm},$$

for $N=100$:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30} \right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}} \hspace{0.05cm},$$

for $N=10$:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1} \right ] \hspace{0.15cm} \underline {= 0.345 \,{\rm (bit)}} \hspace{0.05cm}.$$

(4) Correct is No, as will be shown by the example $N = 100$ :

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25} = 0.0407\ {\rm (bit)}\hspace{0.05cm}.$$

In subtask (3) we got $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$ instead.
This also means: The designation „distance” is somewhat misleading.
According to this, one would actually expect $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ .

Probability function, entropy and Kullback-Leibler distance

(5) With $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$ one obtains:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.$$

Because of the first term, the value of $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ is infinitely large.
For the second Kullback-Leibler distance holds:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+ 0.50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25} \hspace{0.05cm}.$$

After looking at the limits, one can see that the first term yields the result $0$ . The second term also yields zero, and one obtains as the final result:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} \hspace{0.05cm}.$$

Statements 3 and 5 are therefore correct:

From this extreme example it is clear that $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ is always different from $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ .
Only for the special case $P_Y \equiv P_X$ are both Kullback-Leibler distances equal, namely zero.
The adjacent table shows the complete result of this task.

(6) Correct is again No. Although the tendency is clear: The larger $N$ is,

the more $H(Y)$ approaches in principle the final value $H(X) = 2 \ \rm bit$,
the smaller the distances $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ and $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ become.

However, one can also see from the table that there are exceptions:

The entropy $H(Y)$ is smaller for $N = 1000$ than for $N = 400$.
The distance $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ is greater for $N = 1000$ than for $N = 400$.
The reason for this is that the experiment documented here with $N = 400$ was more likely to lead to a uniform distribution than the experiment with $N = 1000$.
If, on the other hand, one were to start a very (infinitely) large number of experiments with $N = 400$ and $N = 1000$ and average over all of them, the actually expected monotonic course would actually result.

Retrieved from "http://en.lntwww.de/index.php?title=Aufgaben:Exercise_3.5Z:_Kullback-Leibler_Distance_again&oldid=41387"

Category:

Information Theory: Exercises

	$D(P_X \hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0.5 \ \rm bit$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y)$ is infinitely large.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X)$ is infinitely large.