Difference between revisions of "Aufgaben:Exercise 3.5Z: Kullback-Leibler Distance again"

From LNTwww

@@ Line 1: / Line 1: @@
-{{quiz-Header|Buchseite=Informationstheorie/Einige Vorbemerkungen zu zweidimensionalen Zufallsgrößen
+{{quiz-Header|Buchseite=Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables
 }}
-[[File:P_ID2762__Inf_Z_3_4.png|right|]]
+[[File:P_ID2762__Inf_Z_3_4.png|right|frame|Determined probability mass functions]]
-Die Wahrscheinlichkeitsfunktion lautet:
+The probability mass function is:
+:$$P_X(X) = \big[\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.15cm},\hspace{0.15cm} 0.25 \hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.03cm}\big]\hspace{0.05cm}.$$
+The random variable&nbsp; $X$&nbsp; is thus characterised by
+* the symbol set size&nbsp; $M=4$,
+* equal probabilities&nbsp; $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .
-$$P_Y(X) = [\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.03cm} 0.25\hspace{0.03cm},\hspace{0.03cm} 0.25 \hspace{0.03cm}, \hspace{0.03cm} 0.25\hspace{0.03cm}]\hspace{0.05cm}$$
-Die Zufallsgröße $X$ ist also gekennzeichnet
-:* durch den Symbolumfang $M=4$,
+The random variable&nbsp; $Y$&nbsp; is always an approximation for&nbsp; $X$:
-:* mit gleichen Wahrscheinlichkeiten.
+*It was obtained by simulation from a uniform distribution, whereby only&nbsp; $N$&nbsp; random numbers were evaluated in each case.&nbsp; This means: &nbsp;
+*$P_Y(1)$, ... , $P_Y(4)$&nbsp; are not probabilities in the conventional sense.&nbsp; Rather, they describe&nbsp; [[Theory_of_Stochastic_Signals/Wahrscheinlichkeit_und_relative_H%C3%A4ufigkeit#Bernoullisches_Gesetz_der_gro.C3.9Fen_Zahlen| relative frequencies]].
-Die Zufallsgröße $Y$ ist stets eine Näherung für $X$. Sie wurde per Simulation aus einer Gleichverteilung gewonnen, wobei jeweils nur $N$ Zufallswerte ausgewertet wurden. Das heißt:
-$P_Y(1)$,...,$P_Y(4)$ sind im herkömmlichen Sinn keine Wahrscheinlichkeiten. Sie beschreiben vielmehr [http://en.lntwww.de/Stochastische_Signaltheorie/Wahrscheinlichkeit_und_relative_H%C3%A4ufigkeit#Bernoullisches_Gesetz_der_gro.C3.9Fen_Zahlen relative Häufigkeiten].
-Das Ergebnis der sechsten Versuchsreihe (mit  $N=1000$) ird demnach durch die folgende Wahrscheinlichkeitsfunktion zusammengefasst:
+The result of the sixth test series&nbsp; (with&nbsp;  $N=1000)$&nbsp; is thus summarised by the following probability function:
-$$P_Y(X) = [\hspace{0.05cm}0.225\hspace{0.05cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.05cm} 0.250 \hspace{0.05cm}, \hspace{0.05cm} 0.272\hspace{0.05cm}]
+:$$P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big]
-\hspace{0.05cm}$$
+\hspace{0.05cm}.$$
-Bei dieser Schreibweise ist bereits berücksichtigt, dass die Zufallsgrößen $X$ und $Y$ auf dem gleichen Alphabet $X =$ {1, 2, 3, 4} basieren.
+This notation already takes into account that the random variables&nbsp; $X$&nbsp; and&nbsp; $Y$&nbsp; are based on the same alphabet&nbsp; $X = \{1,\ 2,\ 3,\ 4\}$.
-Mit diesen Voraussetzungen gilt für die relative Entropie (englisch: Informational Divergence) zwischen den Wahrscheinlichkeitsfunktionen  $P_X(.)$ und $P_Y(.)$ :
+With these preconditions, the&nbsp; '''informational divergence'''&nbsp; between the two probability functions&nbsp;  $P_X(.)$&nbsp; and&nbsp; $P_Y(.)$ :
-$D( P_X || P_Y) = E_X [ log_2 \frac{P_X(X)}{P_Y(Y)}] = \sum\limits_{\mu=1}^M P_X(\mu) . log_2 \frac{P_X(\mu)}{P_Y(\mu)}$
+:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) =  {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{M}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$
-Man bezeichnet  $D( P_X || P_Y)$  als Kullback–Leibler–Distanz. Diese ist ein Maß für die Ähnlichkeit zwischen den beiden Wahrscheinlichkeitsfunktionen
+One calls&nbsp;  $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$&nbsp;  the (first)&nbsp; '''Kullback-Leibler distance'''.
-$P_X(.)$ und $P_Y(.)$.  Die Erwartungswertbildung geschieht hier hinsichtlich der (tatsächlich gleichverteilten) Zufallsgröße $X$.  Dies wird durch die Nomenklatur  $E_X[.]$ angedeutet.
+*This is a measure of the similarity between the two probability mass functions&nbsp; $P_X(.)$&nbsp; and&nbsp; $P_Y(.)$.
+*The expected value formation occurs here with regard to the&nbsp; (actually equally distributed)&nbsp; random variable&nbsp; $X$.&nbsp; This is indicated by the nomenclature&nbsp;  ${\rm E}_X\big[.\big]$.
-Eine zweite Form der Kullback–Leibler–Distanz ergibt sich durch die Erwartungswertbildung  hinsichtlich der Zufallsgröße $Y \Rightarrow E_Y[.]$:
-$D( P_Y || P_X) = E_Y [ log_2 \frac{P_Y(Y)}{P_Y(Y)}] = \sum\limits_{\mu=1}^M P_Y(\mu) . log_2 \frac{P_Y(\mu)}{P_X(\mu)}$
+A second form of Kullback-Leibler distance results from the formation of expected values with respect to the random variable&nbsp; $Y$ &nbsp; &rArr; &nbsp;  ${\rm E}_Y\big [.\big ]$:
-'''Hinweis:''' Die Aufgabe bezieht sich auf das [http://en.lntwww.de/Informationstheorie/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgr%C3%B6%C3%9Fen Kapitel 3.1 ] dieses Buches. Die Angaben der Entropie  $H(Y)$ und der Kullback–Leibler–Distanz  $D( P_X || P_Y)$  in obiger Grafik sind in „bit” zu verstehen. die mit „???"  versehenen Felder sollen von Ihnen in dieser Aufgabe ergänzt werden.
+:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =  {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$
-===Fragebogen===
+Hints:
+*The exercise belongs to the chapter&nbsp; [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables|Some preliminary remarks on two-dimensional random variables]].
+*In particular, reference is made to the page&nbsp; [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables#Informational_divergence_-_Kullback-Leibler_distance|Relative entropy &ndash; Kullback-Leibler distance]].
+*The entropy&nbsp;  $H(Y)$&nbsp; and the Kullback-Leibler distance&nbsp;  $D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp;  in the above graph are to be understood in&nbsp; "bit".
+* The fields marked with&nbsp; "???"&nbsp; in the graph are to be completed by you in this task.
+===Questions===
 <quiz display=simple>
-{Welche Entropie besitzt die Zufallsgröße $X$ ?
+{What is the entropy of the random variable&nbsp; $X$ ?
 |type="{}"}
-$H(X)$ = { 2 3% } $bit$
+$H(X)\ = \ $ { 2 1% } $\ \rm bit$
-{Wie groß sind die Entropien der Zufallsgrößen $Y$ (Näherungen für $X$)?
+{What are the entropies of the random variables&nbsp; $Y$&nbsp; $($approximations for&nbsp; $X)$?
+|type="{}"}
+$N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.9968 1% } $\ \rm bit$
+$N=10^2\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.941 1% } $\ \rm bit$
+$N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \ $ { 1.6855 1%  } $\ \rm bit$
+{Calculate the following Kullback-Leibler distances.
 |type="{}"}
-$N=1000$  $H(Y)$ = { 1.9968 1% } $bit$
+$N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \ $ { 0.00328 1% } $\ \rm bit$
-$N=100$  $H(Y)$ = { 1.941 1% } $bit$
+$N=10^2\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \ $  { 0.0442 1% } $\ \rm bit$
-$N=10$  $H(Y)$ = { 1.6855 1%  } $bit$
+$N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)  \ = \ $  { 0.345 1% } $\ \rm bit$
-{Berechnen Sie die folgenden Kullback–Leibler–Distanzen.
+{Does&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; give exactly the same result in each case?
-|type="{}"}
+|type="()"}
-$N=1000$  $D( P_X || P_Y)$ = { 3.28 1% } . 10 ( { -3 })$bit$
+- Yes.
-$N=100$    $D( P_X || P_Y)$ = { 4.42  1% } . 10 ( { -2 })$bit$
++ No.
-$N=10$      $D( P_X || P_Y)$ = { 3.45 1% } . 10 ( { -1 })$bit$
+{Which statements are true for the Kullback-Leibler distances with&nbsp; $N = 4$?
+|type="[]"}
+- &nbsp; $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0$&nbsp; is true.
+- &nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.5 \ \rm  bit$&nbsp; is true.
++ &nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; is infinitely large.
+- &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0$&nbsp; holds.
++ &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$&nbsp; holds.
+- &nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; is infinitely large.
+{Do both&nbsp; $H(Y)$&nbsp; and&nbsp;  $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; change monotonically with&nbsp; $N$?
+|type="()"}
+- Yes,
++ No.
-{Multiple-Choice Frage
-|type="[]"}
-- Falsch
-+ Richtig
 </quiz>
-===Musterlösung===
+===Solution===
 {{ML-Kopf}}
-'''1.'''
-'''2.'''
+'''(1)'''&nbsp; With equal probabilities, and with&nbsp; $M = 4$:
-'''3.'''
+:$$H(X) = {\rm log}_2 \hspace{0.1cm} M
-'''4.'''
+\hspace{0.15cm} \underline {= 2\,{\rm (bit)}}  \hspace{0.05cm}.$$
-'''5.'''
-'''6.'''
-'''7.'''
+'''(2)'''&nbsp; The probabilities for the empirically determined random variables&nbsp; $Y$&nbsp; generally&nbsp; (not always!)&nbsp; deviate from the uniform distribution the more the parameter&nbsp; $N$&nbsp; is smaller.&nbsp; One obtains for
+* $N = 1000 \ \ \Rightarrow \ \ P_Y(Y) =  \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]$:
+:$$H(Y) =
+.225 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.225} +
+.253 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.253} +
+.250 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.250} +
+.272 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.272}
+\hspace{0.15cm} \underline {= 1.9968\ {\rm (bit)}}  \hspace{0.05cm},$$
+* $N = 100 \ \ \Rightarrow \ \  P_Y(Y) = \big[0.24, \ 0.16, \ 0.30,  \ 0.30\big]$:
+:$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.9410\ {\rm (bit)}}  \hspace{0.05cm},$$
+* $N = 10 \ \ \Rightarrow \ \  P_Y(Y) =  \big[0.5, \ 0.1, \ 0.3, \ 0.1 \big]$:
+:$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.6855\ {\rm (bit)}}  \hspace{0.05cm}.$$
+'''(3)'''&nbsp; The equation for the Kullback-Leibler distance we are looking for is:
+:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)}
+=  \frac{1/4}{{\rm lg} \hspace{0.1cm}(2)} \cdot
+\left [ {\rm lg} \hspace{0.1cm} \frac{0.25}{P_Y(1)} + \frac{0.25}{P_Y(2)} + \frac{0.25}{P_Y(3)} + \frac{0.25}{P_Y(4)}
+\right ] $$
+:$$\Rightarrow \hspace{0.3cm} D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)  =   \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
+\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{P_Y(1) \cdot P_Y(2)\cdot P_Y(3)\cdot P_Y(4)}
+\right ] \hspace{0.05cm}.$$
+The logarithm to the base&nbsp; $ 2$&nbsp; &rArr;  &nbsp; $\log_2(.)$&nbsp; was replaced by the logarithm to the base&nbsp; $ 10$ &nbsp; &rArr;  &nbsp; $\lg(.)$ for easy use of the calculator.
+The following numerical results are obtained:
+* for $N=1000$:
+:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
+\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272}
+\right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}}  \hspace{0.05cm},$$
+* for $N=100$:
+:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
+\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30}
+\right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}}  \hspace{0.05cm},$$
+* for $N=10$:
+:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
+\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1}
+\right ] \hspace{0.15cm} \underline {= 0.345 \,{\rm (bit)}}  \hspace{0.05cm}.$$
+'''(4)'''&nbsp; Correct is&nbsp; <u>'''No'''</u>,&nbsp; as will be shown by the example&nbsp; $N = 100$&nbsp;:
+:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =   \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25}  = 0.0407\ {\rm (bit)}\hspace{0.05cm}.$$
+*In subtask&nbsp; '''(3)'''&nbsp; we got&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$&nbsp; instead.
+*This also means: &nbsp; The designation „distance” is somewhat misleading.
+*According to this, one would actually expect&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;.
+[[File:P_ID2763__Inf_Z_3_4e.png|right|frame|Probability function, entropy and Kullback-Leibler distance]]
+'''(5)'''&nbsp; With&nbsp; $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$&nbsp; one obtains:
+:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.$$
+*Because of the first term, the value of&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; is infinitely large.
+*For the second Kullback-Leibler distance holds:
+:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+
+.50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25}
+	\hspace{0.05cm}.$$
+*After looking at the limits, one can see that the first term yields the result&nbsp; $0$&nbsp;.&nbsp; The second term also yields zero, and one obtains as the final result:
+:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} 	\hspace{0.05cm}.$$
+&nbsp; <u>Statements 3 and 5</u> are therefore correct:
+*From this extreme example it is clear that&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; is always different from&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;.
+*Only for the special case&nbsp; $P_Y \equiv P_X$&nbsp; are both Kullback-Leibler distances equal, namely zero.
+*The adjacent table shows the complete result of this task.
+'''(6)'''&nbsp; Correct is again&nbsp; <u>'''No'''</u>. &nbsp; Although the tendency is clear: &nbsp; The larger&nbsp; $N$&nbsp; is,
+* the more&nbsp; $H(Y)$&nbsp; approaches in principle the final value&nbsp; $H(X) = 2 \ \rm bit$,
+* the smaller the distances&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; and&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ become.
+However, one can also see from the table that there are exceptions:
+* The entropy&nbsp; $H(Y)$&nbsp; is smaller for&nbsp; $N = 1000$&nbsp; than for&nbsp; $N = 400$.
+* The distance&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$&nbsp; is greater for&nbsp; $N = 1000$&nbsp; than for&nbsp; $N = 400$.
+* The reason for this is that the experiment documented here with&nbsp; $N = 400$&nbsp; was more likely to lead to a uniform distribution than the experiment with&nbsp; $N = 1000$.
+*If, on the other hand, one were to start a very (infinitely) large number of experiments with&nbsp; $N = 400$&nbsp; and&nbsp; $N = 1000$&nbsp; and average over all of them, the actually expected monotonic course would actually result.
 {{ML-Fuß}}
-[[Category:Aufgaben zu Informationstheorie|^3.1 Einige Vorbemerkungen zu zweidimensionalen Zufallsgrößen^]]
+[[Category:Information Theory: Exercises|^3.1 General Information on 2D Random Variables^]]

Latest revision as of 10:14, 24 September 2021

Return to book

Determined probability mass functions

The probability mass function is:

$$P_X(X) = \big[\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.15cm},\hspace{0.15cm} 0.25 \hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.03cm}\big]\hspace{0.05cm}.$$

The random variable $X$ is thus characterised by

the symbol set size $M=4$,
equal probabilities $P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4$ .

The random variable $Y$ is always an approximation for $X$:

It was obtained by simulation from a uniform distribution, whereby only $N$ random numbers were evaluated in each case. This means:
$P_Y(1)$, ... , $P_Y(4)$ are not probabilities in the conventional sense. Rather, they describe relative frequencies.

The result of the sixth test series (with $N=1000)$ is thus summarised by the following probability function:

$$P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big] \hspace{0.05cm}.$$

This notation already takes into account that the random variables $X$ and $Y$ are based on the same alphabet $X = \{1,\ 2,\ 3,\ 4\}$.

With these preconditions, the informational divergence between the two probability functions $P_X(.)$ and $P_Y(.)$ :

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{M} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$

One calls $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$ the (first) Kullback-Leibler distance.

This is a measure of the similarity between the two probability mass functions $P_X(.)$ and $P_Y(.)$.
The expected value formation occurs here with regard to the (actually equally distributed) random variable $X$. This is indicated by the nomenclature ${\rm E}_X\big[.\big]$.

A second form of Kullback-Leibler distance results from the formation of expected values with respect to the random variable $Y$ ⇒ ${\rm E}_Y\big [.\big ]$:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$

Hints:

The exercise belongs to the chapter Some preliminary remarks on two-dimensional random variables.
In particular, reference is made to the page Relative entropy – Kullback-Leibler distance.
The entropy $H(Y)$ and the Kullback-Leibler distance $D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ in the above graph are to be understood in "bit".
The fields marked with "???" in the graph are to be completed by you in this task.

Questions

What is the entropy of the random variable $X$ ?

$H(X)\ = \ $

$\ \rm bit$

What are the entropies of the random variables $Y$ $($approximations for $X)$?

$N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

$N=10^2\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

$N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \ $

$\ \rm bit$

Calculate the following Kullback-Leibler distances.

$N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

$N=10^2\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

$N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \ $

$\ \rm bit$

Does $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ give exactly the same result in each case?

	Yes.
	No.

Which statements are true for the Kullback-Leibler distances with $N = 4$?

	$D(P_X \hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0.5 \ \rm bit$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y)$ is infinitely large.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X)$ is infinitely large.

Do both $H(Y)$ and $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ change monotonically with $N$?

	Yes,
	No.

Solution

(1) With equal probabilities, and with $M = 4$:

$$H(X) = {\rm log}_2 \hspace{0.1cm} M \hspace{0.15cm} \underline {= 2\,{\rm (bit)}} \hspace{0.05cm}.$$

(2) The probabilities for the empirically determined random variables $Y$ generally (not always!) deviate from the uniform distribution the more the parameter $N$ is smaller. One obtains for

$N = 1000 \ \ \Rightarrow \ \ P_Y(Y) = \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]$:

$$H(Y) = 0.225 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.225} + 0.253 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.253} + 0.250 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.250} + 0.272 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.272} \hspace{0.15cm} \underline {= 1.9968\ {\rm (bit)}} \hspace{0.05cm},$$

$N = 100 \ \ \Rightarrow \ \ P_Y(Y) = \big[0.24, \ 0.16, \ 0.30, \ 0.30\big]$:

$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.9410\ {\rm (bit)}} \hspace{0.05cm},$$

$N = 10 \ \ \Rightarrow \ \ P_Y(Y) = \big[0.5, \ 0.1, \ 0.3, \ 0.1 \big]$:

$$H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.6855\ {\rm (bit)}} \hspace{0.05cm}.$$

(3) The equation for the Kullback-Leibler distance we are looking for is:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} = \frac{1/4}{{\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25}{P_Y(1)} + \frac{0.25}{P_Y(2)} + \frac{0.25}{P_Y(3)} + \frac{0.25}{P_Y(4)} \right ] $$

$$\Rightarrow \hspace{0.3cm} D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{P_Y(1) \cdot P_Y(2)\cdot P_Y(3)\cdot P_Y(4)} \right ] \hspace{0.05cm}.$$

The logarithm to the base $ 2$ ⇒ $\log_2(.)$ was replaced by the logarithm to the base $ 10$ ⇒ $\lg(.)$ for easy use of the calculator.

The following numerical results are obtained:

for $N=1000$:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272} \right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}} \hspace{0.05cm},$$

for $N=100$:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30} \right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}} \hspace{0.05cm},$$

for $N=10$:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1} \right ] \hspace{0.15cm} \underline {= 0.345 \,{\rm (bit)}} \hspace{0.05cm}.$$

(4) Correct is No, as will be shown by the example $N = 100$ :

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25} = 0.0407\ {\rm (bit)}\hspace{0.05cm}.$$

In subtask (3) we got $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$ instead.
This also means: The designation „distance” is somewhat misleading.
According to this, one would actually expect $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ .

Probability function, entropy and Kullback-Leibler distance

(5) With $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$ one obtains:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.$$

Because of the first term, the value of $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ is infinitely large.
For the second Kullback-Leibler distance holds:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+ 0.50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25} \hspace{0.05cm}.$$

After looking at the limits, one can see that the first term yields the result $0$ . The second term also yields zero, and one obtains as the final result:

$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} \hspace{0.05cm}.$$

Statements 3 and 5 are therefore correct:

From this extreme example it is clear that $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ is always different from $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ .
Only for the special case $P_Y \equiv P_X$ are both Kullback-Leibler distances equal, namely zero.
The adjacent table shows the complete result of this task.

(6) Correct is again No. Although the tendency is clear: The larger $N$ is,

the more $H(Y)$ approaches in principle the final value $H(X) = 2 \ \rm bit$,
the smaller the distances $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$ and $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)$ become.

However, one can also see from the table that there are exceptions:

The entropy $H(Y)$ is smaller for $N = 1000$ than for $N = 400$.
The distance $D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)$ is greater for $N = 1000$ than for $N = 400$.
The reason for this is that the experiment documented here with $N = 400$ was more likely to lead to a uniform distribution than the experiment with $N = 1000$.
If, on the other hand, one were to start a very (infinitely) large number of experiments with $N = 400$ and $N = 1000$ and average over all of them, the actually expected monotonic course would actually result.

Retrieved from "http://en.lntwww.de/index.php?title=Aufgaben:Exercise_3.5Z:_Kullback-Leibler_Distance_again&oldid=41387"

Category:

Information Theory: Exercises

	$D(P_X \hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y) = 0.5 \ \rm bit$ is true.
	$D(P_X\hspace{0.05cm}\|\| \hspace{0.05cm} P_Y)$ is infinitely large.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X) = 0.5 \ \rm bit$ holds.
	$D(P_Y\hspace{0.05cm}\|\| \hspace{0.05cm} P_X)$ is infinitely large.