Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js

Difference between revisions of "Aufgaben:Exercise 3.5Z: Kullback-Leibler Distance again"

From LNTwww
 
(35 intermediate revisions by 5 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Einige Vorbemerkungen zu zweidimensionalen Zufallsgrößen
+
{{quiz-Header|Buchseite=Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables
 
}}
 
}}
  
[[File:P_ID2762__Inf_Z_3_4.png|right|]]
+
[[File:P_ID2762__Inf_Z_3_4.png|right|frame|Determined probability mass functions]]
Die Wahrscheinlichkeitsfunktion lautet:
+
The probability mass function is:
 +
:PX(X)=[0.25,0.25,0.25,0.25].
 +
The random variable  X  is thus characterised by
 +
* the symbol set size  M=4,
 +
* equal probabilities  PX(1)=PX(2)=PX(3)=PX(4)=1/4 .
  
PY(X)=[0.25,0.25,0.25,0.25]
 
Die Zufallsgröße X ist also gekennzeichnet
 
  
:* durch den Symbolumfang $M=4$,
+
The random variable  Y  is always an approximation for  X:
:* mit gleichen Wahrscheinlichkeiten.
+
*It was obtained by simulation from a uniform distribution, whereby only  N  random numbers were evaluated in each case.  This means:  
 +
*$P_Y(1),...,P_Y(4)$  are not probabilities in the conventional sense.  Rather, they describe  [[Theory_of_Stochastic_Signals/Wahrscheinlichkeit_und_relative_H%C3%A4ufigkeit#Bernoullisches_Gesetz_der_gro.C3.9Fen_Zahlen| relative frequencies]].
  
Die Zufallsgröße Y ist stets eine Näherung für X. Sie wurde per Simulation aus einer Gleichverteilung gewonnen, wobei jeweils nur N Zufallswerte ausgewertet wurden. Das heißt: 
 
PY(1),...,PY(4) sind im herkömmlichen Sinn keine Wahrscheinlichkeiten. Sie beschreiben vielmehr [http://en.lntwww.de/Stochastische_Signaltheorie/Wahrscheinlichkeit_und_relative_H%C3%A4ufigkeit#Bernoullisches_Gesetz_der_gro.C3.9Fen_Zahlen relative Häufigkeiten].
 
  
Das Ergebnis der sechsten Versuchsreihe (mit N=1000) ird demnach durch die folgende Wahrscheinlichkeitsfunktion zusammengefasst:
+
The result of the sixth test series  (with  $N=1000)$  is thus summarised by the following probability function:
  
$$P_Y(X) = [\hspace{0.05cm}0.225\hspace{0.05cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.05cm} 0.250 \hspace{0.05cm}, \hspace{0.05cm} 0.272\hspace{0.05cm}]
+
:$$P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big]
\hspace{0.05cm}$$
+
\hspace{0.05cm}.$$
Bei dieser Schreibweise ist bereits berücksichtigt, dass die Zufallsgrößen X und Y auf dem gleichen Alphabet $X =$ {1, 2, 3, 4} basieren.
+
This notation already takes into account that the random variables  X  and  Y  are based on the same alphabet  $X = \{1,\ 2,\ 3,\ 4\}$.
  
Mit diesen Voraussetzungen gilt für die relative Entropie (englisch: Informational Divergence) zwischen den Wahrscheinlichkeitsfunktionen PX(.) und PY(.) :
+
With these preconditions, the  '''informational divergence'''  between the two probability functions  PX(.)  and  PY(.) :
  
$D( P_X || P_Y) = E_X [ log_2 \frac{P_X(X)}{P_Y(Y)}] = \sum\limits_{\mu=1}^M P_X(\mu) . log_2 \frac{P_X(\mu)}{P_Y(\mu)}$
+
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{MP_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$
  
Man bezeichnet D(PX||PY)  als Kullback–Leibler–Distanz. Diese ist ein Maß für die Ähnlichkeit zwischen den beiden Wahrscheinlichkeitsfunktionen
+
One calls  $D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)$  the (first)  '''Kullback-Leibler distance'''.
PX(.) und PY(.)Die Erwartungswertbildung geschieht hier hinsichtlich der (tatsächlich gleichverteilten) Zufallsgröße X. Dies wird durch die Nomenklatur $E_X[.]$ angedeutet.
+
*This is a measure of the similarity between the two probability mass functions  PX(.)  and  PY(.).   
 +
*The expected value formation occurs here with regard to the  (actually equally distributed)  random variable  X.  This is indicated by the nomenclature  ${\rm E}_X\big[.\big]$.
  
Eine zweite Form der Kullback–Leibler–Distanz ergibt sich durch die Erwartungswertbildung  hinsichtlich der Zufallsgröße YEY[.]:
 
  
$D( P_Y || P_X) = E_Y [ log_2 \frac{P_Y(Y)}{P_Y(Y)}] = \sum\limits_{\mu=1}^M P_Y(\mu) . log_2 \frac{P_Y(\mu)}{P_X(\mu)}$
+
A second form of Kullback-Leibler distance results from the formation of expected values with respect to the random variable  $Y  ⇒  {\rm E}_Y\big [.\big ]$:
  
'''Hinweis:''' Die Aufgabe bezieht sich auf das [http://en.lntwww.de/Informationstheorie/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgr%C3%B6%C3%9Fen Kapitel 3.1 ] dieses Buches. Die Angaben der Entropie $H(Y) und der Kullback–Leibler–Distanz  D( P_X || P_Y)$  in obiger Grafik sind in „bit” zu verstehen. die mit „???"  versehenen Felder sollen von Ihnen in dieser Aufgabe ergänzt werden.
+
:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =  {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$
  
  
Line 38: Line 39:
  
  
 +
Hints:
 +
*The exercise belongs to the chapter  [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables|Some preliminary remarks on two-dimensional random variables]].
 +
*In particular, reference is made to the page  [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables#Informational_divergence_-_Kullback-Leibler_distance|Relative entropy – Kullback-Leibler distance]].
 +
*The entropy   H(Y)  and the Kullback-Leibler distance   D( P_X \hspace{0.05cm}|| \hspace{0.05cm}P_Y)   in the above graph are to be understood in  "bit".
 +
* The fields marked with  "???"  in the graph are to be completed by you in this task.
 +
  
  
 +
===Questions===
  
 +
<quiz display=simple>
 +
 +
{What is the entropy of the random variable&nbsp; X ?
 +
|type="{}"}
 +
H(X)\ = \ { 2 1% } \ \rm bit
 +
 +
{What are the entropies of the random variables&nbsp; Y&nbsp; (approximations for&nbsp; X)?
 +
|type="{}"}
 +
N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \ { 1.9968 1% } \ \rm bit
 +
N=10^2\text{:} \hspace{0.5cm} H(Y) \ = \ { 1.941 1% } \ \rm bit
 +
N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \ { 1.6855 1%  } \ \rm bit
 +
 +
{Calculate the following Kullback-Leibler distances.
 +
|type="{}"}
 +
N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \ { 0.00328 1% } \ \rm bit
 +
N=10^2\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm}  P_Y) \ = \   { 0.0442 1% } \ \rm bit
 +
N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)  \ = \   { 0.345 1% } \ \rm bit
 +
 +
{Does&nbsp; D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)&nbsp; give exactly the same result in each case?
 +
|type="()"}
 +
- Yes.
 +
+ No.
 +
 +
{Which statements are true for the Kullback-Leibler distances with&nbsp; N = 4?
 +
|type="[]"}
 +
- &nbsp; D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0&nbsp; is true.
 +
- &nbsp; D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.5 \ \rm  bit&nbsp; is true.
 +
+ &nbsp; D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)&nbsp; is infinitely large.
 +
- &nbsp; D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0&nbsp; holds.
 +
+ &nbsp; D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.5 \ \rm bit&nbsp; holds.
 +
- &nbsp; D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)&nbsp; is infinitely large.
  
 +
{Do both&nbsp; H(Y)&nbsp; and&nbsp;  D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)&nbsp; change monotonically with&nbsp; N?
 +
|type="()"}
 +
- Yes,
 +
+ No.
  
  
 +
</quiz>
  
 +
===Solution===
 +
{{ML-Kopf}}
  
 +
'''(1)'''&nbsp; With equal probabilities, and with&nbsp; M = 4:
 +
:$$H(X) = {\rm log}_2 \hspace{0.1cm} M
 +
\hspace{0.15cm} \underline {= 2\,{\rm (bit)}}  \hspace{0.05cm}.$$
  
  
  
 +
'''(2)'''&nbsp; The probabilities for the empirically determined random variables&nbsp; Y&nbsp; generally&nbsp; (not always!)&nbsp; deviate from the uniform distribution the more the parameter&nbsp; N&nbsp; is smaller.&nbsp; One obtains for
 +
* N = 1000 \ \ \Rightarrow \ \ P_Y(Y) =  \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]:
 +
:$$H(Y) =
 +
0.225 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.225} +
 +
0.253 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.253} +
 +
0.250 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.250} +
 +
0.272 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.272}
 +
\hspace{0.15cm} \underline {= 1.9968\ {\rm (bit)}}  \hspace{0.05cm},$$
 +
* N = 100 \ \ \Rightarrow \ \  P_Y(Y) = \big[0.24, \ 0.16, \ 0.30,  \ 0.30\big]:
 +
:H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.9410\ {\rm (bit)}}  \hspace{0.05cm},
 +
* N = 10 \ \ \Rightarrow \ \  P_Y(Y) =  \big[0.5, \ 0.1, \ 0.3, \ 0.1 \big]:
 +
:H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.6855\ {\rm (bit)}}  \hspace{0.05cm}.
  
  
  
 +
'''(3)'''&nbsp; The equation for the Kullback-Leibler distance we are looking for is:
  
 +
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)}
 +
=  \frac{1/4}{{\rm lg} \hspace{0.1cm}(2)} \cdot
 +
\left [ {\rm lg} \hspace{0.1cm} \frac{0.25}{P_Y(1)} + \frac{0.25}{P_Y(2)} + \frac{0.25}{P_Y(3)} + \frac{0.25}{P_Y(4)}
 +
\right ] $$
 +
:$$\Rightarrow \hspace{0.3cm} D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)  =  \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
 +
\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{P_Y(1) \cdot P_Y(2)\cdot P_Y(3)\cdot P_Y(4)}
 +
\right ] \hspace{0.05cm}.$$
  
 +
The logarithm to the base&nbsp; 2&nbsp; &rArr;  &nbsp; \log_2(.)&nbsp; was replaced by the logarithm to the base&nbsp; 10 &nbsp; &rArr;  &nbsp; \lg(.) for easy use of the calculator.
  
 +
The following numerical results are obtained:
 +
* for N=1000:
 +
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
 +
\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272}
 +
\right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}}  \hspace{0.05cm},$$
 +
* for N=100:
 +
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
 +
\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30}
 +
\right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}}  \hspace{0.05cm},$$
 +
* for N=10:
 +
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot
 +
\left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1}
 +
\right ] \hspace{0.15cm} \underline {= 0.345 \,{\rm (bit)}}  \hspace{0.05cm}.$$
  
  
===Fragebogen===
 
  
<quiz display=simple>
+
'''(4)'''&nbsp; Correct is&nbsp; <u>'''No'''</u>,&nbsp; as will be shown by the example&nbsp; N = 100&nbsp;:
{Multiple-Choice Frage
+
:D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) =  \sum_{\mu = 1}^M  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25}  = 0.0407\ {\rm (bit)}\hspace{0.05cm}.
|type="[]"}
+
 
- Falsch
+
*In subtask&nbsp; '''(3)'''&nbsp; we got&nbsp; $D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442$&nbsp; instead.
+ Richtig
+
*This also means: &nbsp; The designation „distance” is somewhat misleading.
 +
*According to this, one would actually expect&nbsp; $D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;.
 +
 
 +
 
 +
 
 +
[[File:P_ID2763__Inf_Z_3_4e.png|right|frame|Probability function, entropy and Kullback-Leibler distance]]
 +
'''(5)'''&nbsp; With&nbsp; $P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]$&nbsp; one obtains:
 +
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.$$
 +
 
 +
*Because of the first term, the value of&nbsp; D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)&nbsp; is infinitely large.
 +
*For the second Kullback-Leibler distance holds:
 +
:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+
 +
0.50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25}
 +
\hspace{0.05cm}.$$
 +
 
 +
*After looking at the limits, one can see that the first term yields the result&nbsp; 0&nbsp;.&nbsp; The second term also yields zero, and one obtains as the final result:
 +
:D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} \hspace{0.05cm}.
  
 +
&nbsp; <u>Statements 3 and 5</u> are therefore correct:
 +
*From this extreme example it is clear that&nbsp; D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)&nbsp; is always different from&nbsp; D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)&nbsp;.
 +
*Only for the special case&nbsp; P_Y \equiv P_X&nbsp; are both Kullback-Leibler distances equal, namely zero.
 +
*The adjacent table shows the complete result of this task.
  
{Welche Entropie besitzt die Zufallsgröße X ?
 
|type="{}"}
 
\H(X) = {2 3% } bit
 
  
{Wie groß sind die Entropien der Zufallsgrößen Y (Näherungen für X)? ?
 
|type="{}"}
 
\N=1000  H(Y) = { 1.9968 1% } bit
 
\N=100  H(Y) = { 1.941 1% } bit
 
\N=10  H(Y) = { 1.6855 1%  } bit
 
  
  
 +
'''(6)'''&nbsp; Correct is again&nbsp; <u>'''No'''</u>. &nbsp; Although the tendency is clear: &nbsp; The larger&nbsp; N&nbsp; is,
 +
* the more&nbsp; H(Y)&nbsp; approaches in principle the final value&nbsp; H(X) = 2 \ \rm bit,
 +
* the smaller the distances&nbsp; D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)&nbsp; and&nbsp; D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) become.
  
  
</quiz>
+
However, one can also see from the table that there are exceptions:
 +
* The entropy&nbsp; H(Y)&nbsp; is smaller for&nbsp; N = 1000&nbsp; than for&nbsp; N = 400.
 +
* The distance&nbsp; D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)&nbsp; is greater for&nbsp; N = 1000&nbsp; than for&nbsp; N = 400.
 +
* The reason for this is that the experiment documented here with&nbsp; N = 400&nbsp; was more likely to lead to a uniform distribution than the experiment with&nbsp; N = 1000.
 +
*If, on the other hand, one were to start a very (infinitely) large number of experiments with&nbsp; N = 400&nbsp; and&nbsp; N = 1000&nbsp; and average over all of them, the actually expected monotonic course would actually result.
  
===Musterlösung===
 
{{ML-Kopf}}
 
'''1.'''
 
'''2.'''
 
'''3.'''
 
'''4.'''
 
'''5.'''
 
'''6.'''
 
'''7.'''
 
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  
  
  
[[Category:Aufgaben zu Informationstheorie|^3.1 Einige Vorbemerkungen zu zweidimensionalen Zufallsgrößen^]]
+
[[Category:Information Theory: Exercises|^3.1 General Information on 2D Random Variables^]]

Latest revision as of 10:14, 24 September 2021

Determined probability mass functions

The probability mass function is:

P_X(X) = \big[\hspace{0.03cm}0.25\hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.15cm},\hspace{0.15cm} 0.25 \hspace{0.03cm}, \hspace{0.15cm} 0.25\hspace{0.03cm}\big]\hspace{0.05cm}.

The random variable  X  is thus characterised by

  • the symbol set size  M=4,
  • equal probabilities  P_X(1) = P_X(2) = P_X(3) = P_X(4) = 1/4 .


The random variable  Y  is always an approximation for  X:

  • It was obtained by simulation from a uniform distribution, whereby only  N  random numbers were evaluated in each case.  This means:  
  • P_Y(1), ... , P_Y(4)  are not probabilities in the conventional sense.  Rather, they describe  relative frequencies.


The result of the sixth test series  (with  N=1000)  is thus summarised by the following probability function:

P_Y(X) = \big [\hspace{0.05cm}0.225\hspace{0.15cm}, \hspace{0.05cm} 0.253\hspace{0.05cm},\hspace{0.15cm} 0.250 \hspace{0.05cm}, \hspace{0.15cm} 0.272\hspace{0.05cm}\big] \hspace{0.05cm}.

This notation already takes into account that the random variables  X  and  Y  are based on the same alphabet  X = \{1,\ 2,\ 3,\ 4\}.

With these preconditions, the  informational divergence  between the two probability functions  P_X(.)  and  P_Y(.) :

D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = {\rm E}_X \hspace{-0.1cm}\left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^{M} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.

One calls  D( P_X\hspace{0.05cm} || \hspace{0.05cm}P_Y)  the (first)  Kullback-Leibler distance.

  • This is a measure of the similarity between the two probability mass functions  P_X(.)  and  P_Y(.).
  • The expected value formation occurs here with regard to the  (actually equally distributed)  random variable  X.  This is indicated by the nomenclature  {\rm E}_X\big[.\big].


A second form of Kullback-Leibler distance results from the formation of expected values with respect to the random variable  Y   ⇒   {\rm E}_Y\big [.\big ]:

D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = {\rm E}_Y \hspace{-0.1cm} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.



Hints:


Questions

1

What is the entropy of the random variable  X ?

H(X)\ = \

\ \rm bit

2

What are the entropies of the random variables  Y  (approximations for  X)?

N=10^3\text{:} \hspace{0.5cm} H(Y) \ = \

\ \rm bit
N=10^2\text{:} \hspace{0.5cm} H(Y) \ = \

\ \rm bit
N=10^1\text{:} \hspace{0.5cm} H(Y) \ = \

\ \rm bit

3

Calculate the following Kullback-Leibler distances.

N=10^3\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \

\ \rm bit
N=10^2\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \

\ \rm bit
N=10^1\text{:} \hspace{0.5cm} D( P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \ = \

\ \rm bit

4

Does  D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)  give exactly the same result in each case?

Yes.
No.

5

Which statements are true for the Kullback-Leibler distances with  N = 4?

  D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0  is true.
  D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.5 \ \rm bit  is true.
  D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)  is infinitely large.
  D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0  holds.
  D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.5 \ \rm bit  holds.
  D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)  is infinitely large.

6

Do both  H(Y)  and  D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)  change monotonically with  N?

Yes,
No.


Solution

(1)  With equal probabilities, and with  M = 4:

H(X) = {\rm log}_2 \hspace{0.1cm} M \hspace{0.15cm} \underline {= 2\,{\rm (bit)}} \hspace{0.05cm}.


(2)  The probabilities for the empirically determined random variables  Y  generally  (not always!)  deviate from the uniform distribution the more the parameter  N  is smaller.  One obtains for

  • N = 1000 \ \ \Rightarrow \ \ P_Y(Y) = \big [0.225, \ 0.253, \ 0.250, \ 0.272 \big ]:
H(Y) = 0.225 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.225} + 0.253 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.253} + 0.250 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.250} + 0.272 \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{0.272} \hspace{0.15cm} \underline {= 1.9968\ {\rm (bit)}} \hspace{0.05cm},
  • N = 100 \ \ \Rightarrow \ \ P_Y(Y) = \big[0.24, \ 0.16, \ 0.30, \ 0.30\big]:
H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.9410\ {\rm (bit)}} \hspace{0.05cm},
  • N = 10 \ \ \Rightarrow \ \ P_Y(Y) = \big[0.5, \ 0.1, \ 0.3, \ 0.1 \big]:
H(Y) = \hspace{0.05cm}\text{...} \hspace{0.15cm} \underline {= 1.6855\ {\rm (bit)}} \hspace{0.05cm}.


(3)  The equation for the Kullback-Leibler distance we are looking for is:

D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \sum_{\mu = 1}^{4} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} = \frac{1/4}{{\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25}{P_Y(1)} + \frac{0.25}{P_Y(2)} + \frac{0.25}{P_Y(3)} + \frac{0.25}{P_Y(4)} \right ]
\Rightarrow \hspace{0.3cm} D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{P_Y(1) \cdot P_Y(2)\cdot P_Y(3)\cdot P_Y(4)} \right ] \hspace{0.05cm}.

The logarithm to the base  2  ⇒   \log_2(.)  was replaced by the logarithm to the base  10   ⇒   \lg(.) for easy use of the calculator.

The following numerical results are obtained:

  • for N=1000:
D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.225 \cdot 0.253\cdot 0.250\cdot 0.272} \right ] \hspace{0.15cm} \underline {= 0.00328 \,{\rm (bit)}} \hspace{0.05cm},
  • for N=100:
D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.24 \cdot 0.16\cdot 0.30\cdot 0.30} \right ] \hspace{0.15cm} \underline {= 0.0442 \,{\rm (bit)}} \hspace{0.05cm},
  • for N=10:
D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{1}{4 \cdot {\rm lg} \hspace{0.1cm}(2)} \cdot \left [ {\rm lg} \hspace{0.1cm} \frac{0.25^4}{0.5 \cdot 0.1\cdot 0.3\cdot 0.1} \right ] \hspace{0.15cm} \underline {= 0.345 \,{\rm (bit)}} \hspace{0.05cm}.


(4)  Correct is  No,  as will be shown by the example  N = 100 :

D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = \sum_{\mu = 1}^M P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} = 0.24\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.24}{0.25} + 0.16\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.16}{0.25} +2 \cdot 0.30\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.30}{0.25} = 0.0407\ {\rm (bit)}\hspace{0.05cm}.
  • In subtask  (3)  we got  D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.0442  instead.
  • This also means:   The designation „distance” is somewhat misleading.
  • According to this, one would actually expect  D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) = D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) .


Probability function, entropy and Kullback-Leibler distance

(5)  With  P_Y(X) = \big [0, \ 0.25, \ 0.5, \ 0.25 \big ]  one obtains:

D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.50}\hspace{0.05cm}.
  • Because of the first term, the value of  D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)  is infinitely large.
  • For the second Kullback-Leibler distance holds:
D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0\cdot {\rm log}_2 \hspace{0.1cm} \frac{0}{0.25} + 2 \cdot 0.25\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.25}{0.25}+ 0.50\cdot {\rm log}_2 \hspace{0.1cm} \frac{0.5}{0.25} \hspace{0.05cm}.
  • After looking at the limits, one can see that the first term yields the result  0 .  The second term also yields zero, and one obtains as the final result:
D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) = 0.50\cdot {\rm log}_2 \hspace{0.1cm} (2) \hspace{0.15cm} \underline {= 0.5\,{\rm (bit)}} \hspace{0.05cm}.

  Statements 3 and 5 are therefore correct:

  • From this extreme example it is clear that  D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X)  is always different from  D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y) .
  • Only for the special case  P_Y \equiv P_X  are both Kullback-Leibler distances equal, namely zero.
  • The adjacent table shows the complete result of this task.



(6)  Correct is again  No.   Although the tendency is clear:   The larger  N  is,

  • the more  H(Y)  approaches in principle the final value  H(X) = 2 \ \rm bit,
  • the smaller the distances  D(P_X\hspace{0.05cm}|| \hspace{0.05cm} P_Y)  and  D(P_Y\hspace{0.05cm}|| \hspace{0.05cm} P_X) become.


However, one can also see from the table that there are exceptions:

  • The entropy  H(Y)  is smaller for  N = 1000  than for  N = 400.
  • The distance  D(P_X\hspace{0.05cm}|| \hspace{0.05cm}P_Y)  is greater for  N = 1000  than for  N = 400.
  • The reason for this is that the experiment documented here with  N = 400  was more likely to lead to a uniform distribution than the experiment with  N = 1000.
  • If, on the other hand, one were to start a very (infinitely) large number of experiments with  N = 400  and  N = 1000  and average over all of them, the actually expected monotonic course would actually result.