Difference between revisions of "Exercise 3.5: Kullback-Leibler Distance and Binomial Distribution"

From LNTwww
m (Text replacement - "„" to """)
 
(13 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Einige Vorbemerkungen zu zweidimensionalen Zufallsgrößen
+
{{quiz-Header|Buchseite=Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables
 
}}
 
}}
  
[[File:P_ID2759__Inf_A_3_4_A.png|right|frame|Vorgegebene Wahrscheinlichkeiten]]
+
[[File:EN_Inf_A_3_4_A.png|right|frame|Given probabilities]]
Wir gehen hier von der&nbsp; [[Theory_of_Stochastic_Signals/Binomialverteilung|Binomialverteilung]]&nbsp; aus, die durch die Parameter &nbsp;$I$&nbsp; und &nbsp;$p$&nbsp; gekennzeichnet ist &nbsp; <br>&#8658; &nbsp; siehe  Buch "Stochastische Signaltheorie":
+
We assume here the&nbsp; [[Theory_of_Stochastic_Signals/Binomialverteilung|binomial distribution]],&nbsp; which is characterised by the parameters &nbsp;$I$&nbsp; and &nbsp;$p$&nbsp; &nbsp; <br>&#8658; &nbsp; see the book&nbsp; [[Theory_of_Stochastic_Signals|"Theory of Stochastic Signals"]]:
  
* Wertebereich:
+
* Range of Values:
 
:$$X = \{\hspace{0.05cm}0\hspace{0.05cm}, \hspace{0.15cm} 1\hspace{0.05cm},\hspace{0.15cm}
 
:$$X = \{\hspace{0.05cm}0\hspace{0.05cm}, \hspace{0.15cm} 1\hspace{0.05cm},\hspace{0.15cm}
 
2\hspace{0.05cm},\hspace{0.15cm}  \text{...}\hspace{0.1cm} ,\hspace{0.15cm} {\mu}\hspace{0.05cm}, \hspace{0.05cm}\text{...}\hspace{0.1cm} , \hspace{0.15cm} I\hspace{0.05cm}\}\hspace{0.05cm},$$
 
2\hspace{0.05cm},\hspace{0.15cm}  \text{...}\hspace{0.1cm} ,\hspace{0.15cm} {\mu}\hspace{0.05cm}, \hspace{0.05cm}\text{...}\hspace{0.1cm} , \hspace{0.15cm} I\hspace{0.05cm}\}\hspace{0.05cm},$$
* Wahrscheinlichkeiten:
+
* Probabilities:
 
:$$P_X (X = \mu) = {I \choose \mu} \cdot p^{\mu} \cdot (1-p)^{I-\mu} \hspace{0.05cm},$$
 
:$$P_X (X = \mu) = {I \choose \mu} \cdot p^{\mu} \cdot (1-p)^{I-\mu} \hspace{0.05cm},$$
* linearer Mittelwert:
+
* Linear mean:
 
:$$m_X = I  \cdot p \hspace{0.05cm},$$
 
:$$m_X = I  \cdot p \hspace{0.05cm},$$
* Varianz:
+
* Variance:
 
:$$\sigma_X^2 = I  \cdot p \cdot (1-p)\hspace{0.05cm}.$$
 
:$$\sigma_X^2 = I  \cdot p \cdot (1-p)\hspace{0.05cm}.$$
Im rot hinterlegten Teil der Tabelle sind die Wahrscheinlichkeiten&nbsp; $P_X(X = \mu$)&nbsp; der betrachteten Binomialverteilung angegeben.&nbsp; In der Teilaufgabe&nbsp; '''(1)'''&nbsp; sollen Sie die dazugehörigen Verteilungsparameter &nbsp;$I$&nbsp; und &nbsp;$p$&nbsp; bestimmen.
+
In the part of the table highlighted in red, the probabilities&nbsp; $P_X(X = \mu$)&nbsp; of the binomial distribution under consideration are given.&nbsp; In subtask&nbsp; '''(1)'''&nbsp; you are to determine the corresponding distribution parameters &nbsp;$I$&nbsp; and &nbsp;$p$.
  
  
Diese vorgegebene Binomialverteilung soll hier durch eine&nbsp; [[Theory_of_Stochastic_Signals/Poissonverteilung|Poissonverteilung]]&nbsp; $Y$&nbsp; approximiert werden, gekennzeichnet durch die Rate &nbsp;$\lambda$:
+
This given binomial distribution is to be approximated here by a&nbsp; [[Theory_of_Stochastic_Signals/Poissonverteilung|Poisson distribution]]&nbsp; $Y$,&nbsp; characterised by the rate &nbsp;$\lambda$:
  
* Wertebereich:
+
* Range of values:
 
:$$Y = \{\hspace{0.05cm}0\hspace{0.05cm}, \hspace{0.15cm} 1\hspace{0.05cm},\hspace{0.05cm}
 
:$$Y = \{\hspace{0.05cm}0\hspace{0.05cm}, \hspace{0.15cm} 1\hspace{0.05cm},\hspace{0.05cm}
 
2\hspace{0.05cm},\hspace{0.15cm}  \text{...}\hspace{0.1cm} ,\hspace{0.15cm} {\mu}\hspace{0.05cm}, \hspace{0.05cm}\text{...}\hspace{0.1cm}\}\hspace{0.05cm},$$
 
2\hspace{0.05cm},\hspace{0.15cm}  \text{...}\hspace{0.1cm} ,\hspace{0.15cm} {\mu}\hspace{0.05cm}, \hspace{0.05cm}\text{...}\hspace{0.1cm}\}\hspace{0.05cm},$$
* Wahrscheinlichkeiten:
+
* Probabilites:
:$$P_Y (Y = \mu) = \frac{\lambda^{\mu}}{\mu !} \cdot {\rm e}^{\lambda} \hspace{0.05cm},$$
+
:$$P_Y (Y = \mu) = \frac{\lambda^{\mu}}{\mu !} \cdot {\rm e}^{-\lambda} \hspace{0.05cm},$$
* Erwartungswerte:
+
* Expected values:
 
:$$m_Y = \sigma_Y^2 = \lambda\hspace{0.05cm}.$$
 
:$$m_Y = \sigma_Y^2 = \lambda\hspace{0.05cm}.$$
  
Um abschätzen zu können, ob die Wahrscheinlichkeitsfunktion&nbsp; $P_X(X)$&nbsp; ausreichend gut durch&nbsp; $P_Y(Y)$&nbsp; approximiert wird, kann man auf die so genannten&nbsp; <i>Kullback&ndash;Leibler&ndash;Distanzen</i>&nbsp; ('''KLD''')&nbsp; zurückgreifen, in der Literatur teilweise  auch&nbsp;  "relative Entropien"&nbsp; genannt.  
+
In order to assess whether the probability mass function&nbsp; $P_X(X)$&nbsp; is sufficiently well approximated by&nbsp; $P_Y(Y)$,&nbsp; one can resort to the so-called&nbsp; <b>Kullback&ndash;Leibler distances</b>&nbsp; $\rm (KLD)$,&nbsp; sometimes also called&nbsp;  "relative entropies"&nbsp; in the literature.
  
Angepasst an das vorliegende Beispiel lauten diese:
+
Adapted to the present example, these are:
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \hspace{0.15cm}  =  \hspace{0.15cm}  {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 0}^{I}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm},$$
 
:$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \hspace{0.15cm}  =  \hspace{0.15cm}  {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 0}^{I}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm},$$
[[File:P_ID2760__Inf_A_3_4_B.png|right|frame|Beiliegende Ergebnistabelle]]
+
[[File:EN_Inf_A_3_4_B.png|right|frame|Enclosed results table]]
 
:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) \hspace{0.15cm}  =  \hspace{0.15cm}  {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 0}^{\infty}  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$
 
:$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) \hspace{0.15cm}  =  \hspace{0.15cm}  {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 0}^{\infty}  P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$
Bei Verwendung von&nbsp; $\log_2$&nbsp; ist dem Zahlenwert die Pseudo&ndash;Einheit "bit" hinzuzufügen.
+
If&nbsp; $\log_2$&nbsp; is used, the pseudo&ndash;unit&nbsp; "bit"&nbsp; must be added to the numerical value.
  
In nebenstehender  Tabelle ist die Kullback&ndash;Leibler&ndash;Distanz&nbsp;  $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;  (in "bit") zwischen der Binomial&ndash;PMF&nbsp;  $P_X(\cdot)$&nbsp;  und einigen Poisson&ndash;Näherungen&nbsp;  $P_Y(\cdot)$&nbsp;    $($mit fünf verschiedenen Raten $\lambda)$&nbsp; eingetragen.
+
In the adjacent table, the Kullback&ndash;Leibler&ndash;distance&nbsp;  $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp;  (in&nbsp; "bit")&nbsp; between the binomial PMF&nbsp;  $P_X(\cdot)$&nbsp;  and some Poisson approximations&nbsp;  $P_Y(\cdot)$&nbsp;    $($with five different rates $\lambda)$&nbsp; is entered.  
*Die jeweilige Entropie &nbsp;$H(Y)$, die ebenfalls von der Rate &nbsp;$\lambda$&nbsp; abhängt, ist in der ersten Zeile angegeben.
+
*The respective entropy &nbsp;$H(Y)$, which also depends on the rate &nbsp;$\lambda$,&nbsp; is given in the first row.
 +
*The columns for&nbsp; $\lambda = 1$&nbsp; are to be completed in subtasks&nbsp; '''(3)'''&nbsp; and '''(4)'''&nbsp;.
 +
*In subtask&nbsp; '''(6)'''&nbsp; these results are to be interpreted.
  
*Die Spalten für&nbsp; $\lambda = 1$&nbsp; sind in den Teilaufgaben&nbsp; '''(3)'''&nbsp; und '''(4)'''&nbsp; zu ergänzen.
 
*In der Teilaufgabe&nbsp; '''(6)'''&nbsp; sollen diese Ergebnisse interpretiert werden.
 
  
  
  
 
+
Hints:
''Hinweise:''
+
*The exercise belongs to the chapter&nbsp; [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables|Some preliminary remarks on two-dimensional random variables]].
*Die Aufgabe gehört zum  Kapitel&nbsp; [[Information_Theory/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgrößen|Einige Vorbemerkungen zu den 2D-Zufallsgrößen]].
+
*In particular, reference is made to the section&nbsp; [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables#Informational_divergence_-_Kullback-Leibler_distance|Relative entropy &ndash; Kullback-Leibler distance]].
*Insbesondere wird Bezug genommen auf die Seite&nbsp; [[Information_Theory/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgrößen#Relative_Entropie_.E2.80.93_Kullback.E2.80.93Leibler.E2.80.93Distanz|Relative Entropie &ndash; Kullback-Leibler-Distanz]].
+
*To keep the numerical calculations within limits, the following auxiliary quantities are given;&nbsp;  here&nbsp; $\rm \lg$&nbsp; denotes the logarithm to base&nbsp; $10$:
+
:$$A\hspace{0.05cm}' =
*Um die numerischen Berechnungen in Grenzen zu halten, werden folgende Hilfsgrößen vorgegeben;&nbsp;  hierbei bezeichnet&nbsp; $\rm \lg$&nbsp; den Logarithmus zur Basis&nbsp; $10$:
 
:$$A' =
 
 
0.4096 \cdot {\rm lg} \hspace{0.1cm} \frac{0.4096}{0.3679} +
 
0.4096 \cdot {\rm lg} \hspace{0.1cm} \frac{0.4096}{0.3679} +
 
0.2048 \cdot {\rm lg} \hspace{0.1cm} \frac{0.2048}{0.1839} +
 
0.2048 \cdot {\rm lg} \hspace{0.1cm} \frac{0.2048}{0.1839} +
Line 56: Line 54:
 
0.0064 \cdot {\rm lg} \hspace{0.1cm} \frac{0.0064}{0.0153} +
 
0.0064 \cdot {\rm lg} \hspace{0.1cm} \frac{0.0064}{0.0153} +
 
0.0003 \cdot {\rm lg} \hspace{0.1cm} \frac{0.0003}{0.0031} \hspace{0.05cm},$$
 
0.0003 \cdot {\rm lg} \hspace{0.1cm} \frac{0.0003}{0.0031} \hspace{0.05cm},$$
:$$B' =
+
:$$B\hspace{0.05cm}' =
 
0.1839 \cdot {\rm lg} \hspace{0.1cm} (0.1839) +
 
0.1839 \cdot {\rm lg} \hspace{0.1cm} (0.1839) +
 
0.0613 \cdot {\rm lg} \hspace{0.1cm} (0.0613) +
 
0.0613 \cdot {\rm lg} \hspace{0.1cm} (0.0613) +
Line 63: Line 61:
 
0.0005 \cdot {\rm lg} \hspace{0.1cm} (0.0005) +
 
0.0005 \cdot {\rm lg} \hspace{0.1cm} (0.0005) +
 
0.0001 \cdot {\rm lg} \hspace{0.1cm} (0.0001)$$
 
0.0001 \cdot {\rm lg} \hspace{0.1cm} (0.0001)$$
:$$\Rightarrow \hspace{0.3cm} A'  \hspace{0.15cm} \underline {= 0.021944}  \hspace{0.05cm},\hspace{0.5cm}
+
:$$\Rightarrow \hspace{0.3cm} A\hspace{0.05cm}'  \hspace{0.15cm} \underline {= 0.021944}  \hspace{0.05cm},\hspace{0.5cm}
B'  \hspace{0.15cm} \underline {= -0.24717}  \hspace{0.05cm}.$$
+
B\hspace{0.05cm}'  \hspace{0.15cm} \underline {= -0.24717}  \hspace{0.05cm}.$$
  
  
===Fragebogen===
+
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
{Wie lauten die Kenngrößen der vorliegenden Binomialverteilung?&nbsp; <i>Hinweis:</i> Geben Sie (maximal) eine Nachkommastelle ein.
+
{What are the characteristics of the present binomial distribution? &nbsp; Hint:&nbsp; Enter (at most) one decimal place.
 
|type="{}"}
 
|type="{}"}
 
$I \hspace{0.47cm}  = \ $ { 5 3% }
 
$I \hspace{0.47cm}  = \ $ { 5 3% }
Line 78: Line 76:
  
  
{Welche Kullback&ndash;Leibler&ndash;Distanz sollte man hier verwenden?
+
{Which Kullback&ndash;Leibler distance should be used here?
 
|type="[]"}
 
|type="[]"}
- Keine der beiden Distanzen ist anwendbar.
+
- Neither of the two distances is applicable.
+ $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; ist besser geeignet.
+
+ $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; is more suitable
- $D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; ist besser geeignet.
+
- $D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; is more suitable
- Beide Kullback&ndash;Leibler&ndash;Distanzen sind anwendbar.
+
- Both Kullback&ndash;Leibler distances are applicable.
  
  
{Berechnen Sie die geeignete Kullback&ndash;Leibler&ndash;Distanz&nbsp;  $($hier mit&nbsp; $D$&nbsp; abgekürzt$)$&nbsp; für &nbsp;$\lambda = 1$.&nbsp; Berücksichtigen Sie die Hilfsgröße &nbsp;$A\hspace{0.05cm}'$.
+
{Calculate the appropriate Kullback&ndash;Leibler distance&nbsp;  $($abbreviated here as&nbsp; $D)$&nbsp; for &nbsp;$\lambda = 1$.&nbsp; Consider the auxiliary quantity &nbsp;$A\hspace{0.05cm}'$.
 
|type="{}"}
 
|type="{}"}
 
$D \ = \ $ { 0.0182 3% } $\ \rm bit$
 
$D \ = \ $ { 0.0182 3% } $\ \rm bit$
  
  
{Berechnen Sie die Entropie &nbsp;$H(Y)$&nbsp; der Poisson&ndash;Näherung mit der Rate &nbsp;$\lambda = 1$.&nbsp; Berücksichtigen Sie die Hilfsgröße &nbsp;$B\hspace{0.05cm}'$.
+
{Calculate the entropy &nbsp;$H(Y)$&nbsp; of the Poisson approximation with rate &nbsp;$\lambda = 1$.&nbsp; Consider the auxiliary quantity &nbsp;$B\hspace{0.05cm}'$.
 
|type="{}"}
 
|type="{}"}
 
$H(Y) \ = \ $ { 1.864 3% } $\ \rm bit$
 
$H(Y) \ = \ $ { 1.864 3% } $\ \rm bit$
  
  
{Welche der folgenden Aussagen sind zutreffend?
+
{Which of the following statements are true?
 
|type="[]"}
 
|type="[]"}
+ Bei der &nbsp;$H(Y)$&ndash;Berechnung haben alle Terme gleiches Vorzeichen.
+
+ In the &nbsp;$H(Y)$&nbsp; calculation, all terms have the same sign.
- Bei der&nbsp; $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&ndash;Berechnung haben alle Terme gleiches Vorzeichen.
+
- In the &nbsp; $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)$&nbsp; calculation all terms have the same sign.
  
  
{Wie interpretieren Sie die vervollständigte Ergebnistabelle?
+
{How do you interpret the completed results table?
 
|type="[]"}
 
|type="[]"}
+ Nach der Kullback&ndash;Leibler&ndash;Distanz sollte man &nbsp;$\lambda = 1$&nbsp; wählen.
+
+ According Kullback&ndash;Leibler distance, you should choose &nbsp;$\lambda = 1$.
- $\lambda = 1$&nbsp; garantiert die beste Approximation &nbsp;$H(Y) &asymp; H(X)$.
+
- $\lambda = 1$&nbsp; guarantees the best approximation &nbsp;$H(Y) &asymp; H(X)$.
  
  
Line 111: Line 109:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
'''(1)'''&nbsp; Bei der Binomialverteilung sind alle Wahrscheinlichkeiten&nbsp; ${\rm Pr}(X > I) = 0$ &nbsp; &#8658; &nbsp; $\underline{I = 5}$.&nbsp; Damit ergibt sich für die Wahrscheinlichkeit, dass&nbsp; $X =I = 5$&nbsp; ist:
+
'''(1)'''&nbsp; With the binomial distribution, all probabilities are&nbsp; ${\rm Pr}(X > I) = 0$ &nbsp; &#8658; &nbsp; $\underline{I = 5}$.&nbsp; Thus, for the probability that&nbsp; $X =I = 5$,&nbsp; we get:
 
:$${\rm Pr} (X = 5) = {5 \choose 5} \cdot p^{5} =  p^{5}  \approx 0.0003 \hspace{0.05cm}.$$
 
:$${\rm Pr} (X = 5) = {5 \choose 5} \cdot p^{5} =  p^{5}  \approx 0.0003 \hspace{0.05cm}.$$
Somit erhält man für
+
Thus one obtains for
  
* die charakteristische Wahrscheinlichkeit: &nbsp; $p= (0.0003)^{1/5} = 0.1974 \hspace{0.15cm} \underline {\approx 0.2}\hspace{0.05cm},$
+
* the characteristic probability: &nbsp; $p= (0.0003)^{1/5} = 0.1974 \hspace{0.15cm} \underline {\approx 0.2}\hspace{0.05cm},$
* den linearen Mittelwert (Erwartungswert): &nbsp; $m_X = I  \cdot p  \hspace{0.15cm} \underline {= 1}\hspace{0.05cm},$
+
* the linear mean (expected value): &nbsp; $m_X = I  \cdot p  \hspace{0.15cm} \underline {= 1}\hspace{0.05cm},$
* die Varianz: &nbsp; $\sigma_X^2 = I  \cdot p \cdot (1-p)  \hspace{0.15cm} \underline {= 0.8}\hspace{0.05cm}.$
+
* the variance: &nbsp; $\sigma_X^2 = I  \cdot p \cdot (1-p)  \hspace{0.15cm} \underline {= 0.8}\hspace{0.05cm}.$
  
  
  
  
'''(2)'''&nbsp; Richtig ist der <u>Lösungsvorschlag 2</u>:
+
'''(2)'''&nbsp; <u>Proposed solution 2</u> is correct:
*Bei Verwendung von&nbsp; $D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; würde sich unabhängig von&nbsp; $&lambda;$&nbsp; stets ein unendlicher Wert ergeben, da für&nbsp; $\mu &#8805; 6$&nbsp; gilt:
+
*Using&nbsp; $D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X)$&nbsp; would always result in an infinite value regardless of&nbsp; $&lambda;$&nbsp; since for&nbsp; $\mu &#8805; 6$&nbsp;:
 
:$$P_X (X = \mu) = 0 \hspace{0.05cm},\hspace{0.3cm}P_Y (Y = \mu) \ne 0 \hspace{0.05cm}.$$
 
:$$P_X (X = \mu) = 0 \hspace{0.05cm},\hspace{0.3cm}P_Y (Y = \mu) \ne 0 \hspace{0.05cm}.$$
*Auch wenn die Wahrscheinlichkeiten&nbsp; $P_Y (Y = \mu)$&nbsp; für große&nbsp; $&mu;$&nbsp; sehr klein werden, sind sie doch "unendlich viel größer" als&nbsp; $P_X (X = \mu)$.
+
*Even though the probabilities&nbsp; $P_Y (Y = \mu)$&nbsp; become very small for large&nbsp; $&mu;$&nbsp; they are still&nbsp; "infinitely larger"&nbsp; than&nbsp; $P_X (X = \mu)$.
  
  
  
  
'''(3)'''&nbsp; Wir verwenden die erste Kullback&ndash;Leibler&ndash;Distanz:
+
'''(3)'''&nbsp; We use the first Kullback&ndash;Leibler distance:
 
:$$D = D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) =\hspace{0.2cm} \sum_{\mu = 0}^{5}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$
 
:$$D = D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) =\hspace{0.2cm} \sum_{\mu = 0}^{5}  P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$
*Bei Verwendung des Zehnerlogarithmus&nbsp; $(\lg)$&nbsp; erhalten wir für die Poisson&ndash;Näherung mit &nbsp;$\lambda = 1$:
+
*Using the logarithm of base ten&nbsp; $(\lg)$,&nbsp; we obtain for the Poisson approximation with &nbsp;$\lambda = 1$:
 
:$$D \hspace{0.05cm}' = 0.3277 \cdot {\rm lg} \hspace{0.1cm} \frac{0.3277}{0.3679} + A \hspace{0.05cm}' =  
 
:$$D \hspace{0.05cm}' = 0.3277 \cdot {\rm lg} \hspace{0.1cm} \frac{0.3277}{0.3679} + A \hspace{0.05cm}' =  
 
-0.016468 + 0.021944 = 0.005476 \hspace{0.05cm}.$$
 
-0.016468 + 0.021944 = 0.005476 \hspace{0.05cm}.$$
*Nach Umrechnung auf den Zweierlogarithmus&nbsp; $(\log_2)$&nbsp; erhält man schließlich:
+
*After converting to the logarithm of base two&nbsp; $(\log_2)$&nbsp;, we finally obtain:
 
:$$D = D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{0.005476}{{\rm lg} \hspace{0.1cm}(2)} \hspace{0.15cm} \underline {\approx 0.0182\ {\rm (bit)}}\hspace{0.05cm}.$$
 
:$$D = D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{0.005476}{{\rm lg} \hspace{0.1cm}(2)} \hspace{0.15cm} \underline {\approx 0.0182\ {\rm (bit)}}\hspace{0.05cm}.$$
  
  
'''(4)'''&nbsp; Unter Verwendung des Zehnerlogarithmus lautet die Entropie der Poisson&ndash;Näherung&nbsp; $(\lambda = 1)$:
+
'''(4)'''&nbsp; Using the logarithm of base ten, the entropy of the Poisson approximation&nbsp; $(\lambda = 1)$:
 
:$$H\hspace{0.05cm}'(Y) = -{\rm E} \left [{\rm lg} \hspace{0.1cm} {P_Y(Y)} \right ]
 
:$$H\hspace{0.05cm}'(Y) = -{\rm E} \left [{\rm lg} \hspace{0.1cm} {P_Y(Y)} \right ]
 
= -2 \cdot 0.3679 \cdot {\rm lg} \hspace{0.1cm} (0.3679) - B\hspace{0.05cm}' = 0.31954 + 0.24717 = 0.56126.$$
 
= -2 \cdot 0.3679 \cdot {\rm lg} \hspace{0.1cm} (0.3679) - B\hspace{0.05cm}' = 0.31954 + 0.24717 = 0.56126.$$
*Die Umrechnung in "bit" liefert das gesuchte Ergebnis:
+
*Converting to&nbsp; "bit"&nbsp; gives the result we are looking for:
 
:$$H(Y) = \frac{0.56126}{{\rm lg} \hspace{0.1cm}(2)}
 
:$$H(Y) = \frac{0.56126}{{\rm lg} \hspace{0.1cm}(2)}
 
\hspace{0.15cm} \underline {= 1.864\ {\rm (bit)}} \hspace{0.05cm}.$$
 
\hspace{0.15cm} \underline {= 1.864\ {\rm (bit)}} \hspace{0.05cm}.$$
Line 150: Line 148:
  
  
'''(5)'''&nbsp; Richtig ist die <u>Aussage 1</u>.&nbsp; Bei der numerischen Berechnung der Kullback&ndash;Leibler&ndash;Distanz ist
+
'''(5)'''&nbsp; Correct is <u>statement 1</u>.&nbsp; In the numerical calculation of the Kullback&ndash;Leibler distance
* der Beitrag des&nbsp; $&mu;$&ndash;ten Terms positiv, falls&nbsp; $P_Y(\mu) > P_X(\mu)$,
+
* the contribution of the&nbsp; $&mu;$&ndash;th term is positive,&nbsp; if&nbsp; $P_Y(\mu) > P_X(\mu)$,
* der Beitrag des&nbsp; $&mu;$&ndash;ten Terms negativ, falls&nbsp; $P_Y(\mu) < P_X(\mu)$.
+
* the contribution of the&nbsp; $&mu;$&ndash;th term is negative,&nbsp; if&nbsp; $P_Y(\mu) < P_X(\mu)$.
  
  
[[File:P_ID2761__Inf_A_3_4_C.png|right|frame|Kullback–Leibler–Distanz und Entropie]]
+
[[File:P_ID2761__Inf_A_3_4_C.png|right|frame|Kullback–Leibler distance and entropy]]
  
  
  
'''(6)'''&nbsp; Zutreffend ist der <u>Lösungsvorschlag 1</u>:  
+
'''(6)'''&nbsp; <u>Proposed solution 1</u> is correct:  
*Auch aus der Grafik ist ersichtlich, dass&nbsp; $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) =0.0182$&nbsp; bit&nbsp; von keinem anderen &nbsp;$&lambda;$&ndash;Wert als &nbsp;$&lambda; = 1$&nbsp; unterschritten wird (grüne Kreuze).
+
*It can also be seen from the graph that&nbsp; $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) =0.0182$&nbsp; bit&nbsp; is not undercut by any &nbsp;$&lambda;$&ndash;value other than &nbsp;$&lambda; = 1$&nbsp; (green crosses).
*Weiter erkennt man aus dieser Darstellung, dass man mit &nbsp;$&lambda; = 0.9$&nbsp; eine bessere Entropie&ndash;Approximation als mit &nbsp;$&lambda; = 1$&nbsp; erreicht  (blaue Kreise):
+
*Furthermore, one can see from this plot that a better entropy approximation is achieved with &nbsp;$&lambda; = 0.9$&nbsp; than with &nbsp;$&lambda; = 1$&nbsp; (blue circles):
 
:$$H(Y) = 1.795\ {\rm bit} \hspace{0.15cm}\approx \hspace{0.15cm} H(X) = 1.793\ {\rm bit}\hspace{0.05cm}.$$
 
:$$H(Y) = 1.795\ {\rm bit} \hspace{0.15cm}\approx \hspace{0.15cm} H(X) = 1.793\ {\rm bit}\hspace{0.05cm}.$$
:Der zweite Lösungsvorschlag ist also falsch.  
+
:The second proposed solution is therefore wrong.
  
* Mit &nbsp;$&lambda; = 1$&nbsp; stimmen die&nbsp; <u>linearen Mittelwerte</u>&nbsp; der beiden Zufallsgrößen  überein:  
+
* With &nbsp;$&lambda; = 1$&nbsp;, the&nbsp; <u>linear means</u>&nbsp; of the two random variables coincide:  
 
:$$m_X = m_Y= 1.$$
 
:$$m_X = m_Y= 1.$$
* Mit &nbsp;$&lambda; = 0.9$ stimmen die&nbsp; <u>quadratischen Mittelwerte</u>&nbsp; überein:  
+
* With &nbsp;$&lambda; = 0.9$ the&nbsp; <u>second moments</u>&nbsp; agree:  
 
:$$m_X + \sigma_X^2 = m_Y + \sigma_Y^2= 1.8.$$
 
:$$m_X + \sigma_X^2 = m_Y + \sigma_Y^2= 1.8.$$
  
Ob diese Aussage relevant ist, lassen wir dahingestellt.&nbsp; Denn: &nbsp; Aufgrund der stetigen Zunahme von&nbsp; $H(Y)$&nbsp; mit zunehmendem&nbsp; $&lambda;$&nbsp; ist klar, dass für irgendeinen&nbsp; $&lambda;$&ndash;Wert tatsächlich&nbsp; $H(Y) = H(X)$&nbsp; gelten muss.
+
Whether this statement is relevant, we leave undecided.&nbsp;  
 +
 
 +
Because: &nbsp; Due to the steady increase of &nbsp; $H(Y)$&nbsp; with increasing&nbsp; $&lambda;$&nbsp;, it is clear that for some&nbsp; $&lambda;$&ndash;value &nbsp; $H(Y) = H(X)$&nbsp; must indeed hold.
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  
  
  
[[Category:Information Theory: Exercises|^3.1 Allgemeines zu 2D-Zufallsgrößen^]]
+
[[Category:Information Theory: Exercises|^3.1 General Information on 2D Random Variables^]]

Latest revision as of 14:19, 18 January 2023

Given probabilities

We assume here the  binomial distribution,  which is characterised by the parameters  $I$  and  $p$   
⇒   see the book  "Theory of Stochastic Signals":

  • Range of Values:
$$X = \{\hspace{0.05cm}0\hspace{0.05cm}, \hspace{0.15cm} 1\hspace{0.05cm},\hspace{0.15cm} 2\hspace{0.05cm},\hspace{0.15cm} \text{...}\hspace{0.1cm} ,\hspace{0.15cm} {\mu}\hspace{0.05cm}, \hspace{0.05cm}\text{...}\hspace{0.1cm} , \hspace{0.15cm} I\hspace{0.05cm}\}\hspace{0.05cm},$$
  • Probabilities:
$$P_X (X = \mu) = {I \choose \mu} \cdot p^{\mu} \cdot (1-p)^{I-\mu} \hspace{0.05cm},$$
  • Linear mean:
$$m_X = I \cdot p \hspace{0.05cm},$$
  • Variance:
$$\sigma_X^2 = I \cdot p \cdot (1-p)\hspace{0.05cm}.$$

In the part of the table highlighted in red, the probabilities  $P_X(X = \mu$)  of the binomial distribution under consideration are given.  In subtask  (1)  you are to determine the corresponding distribution parameters  $I$  and  $p$.


This given binomial distribution is to be approximated here by a  Poisson distribution  $Y$,  characterised by the rate  $\lambda$:

  • Range of values:
$$Y = \{\hspace{0.05cm}0\hspace{0.05cm}, \hspace{0.15cm} 1\hspace{0.05cm},\hspace{0.05cm} 2\hspace{0.05cm},\hspace{0.15cm} \text{...}\hspace{0.1cm} ,\hspace{0.15cm} {\mu}\hspace{0.05cm}, \hspace{0.05cm}\text{...}\hspace{0.1cm}\}\hspace{0.05cm},$$
  • Probabilites:
$$P_Y (Y = \mu) = \frac{\lambda^{\mu}}{\mu !} \cdot {\rm e}^{-\lambda} \hspace{0.05cm},$$
  • Expected values:
$$m_Y = \sigma_Y^2 = \lambda\hspace{0.05cm}.$$

In order to assess whether the probability mass function  $P_X(X)$  is sufficiently well approximated by  $P_Y(Y)$,  one can resort to the so-called  Kullback–Leibler distances  $\rm (KLD)$,  sometimes also called  "relative entropies"  in the literature.

Adapted to the present example, these are:

$$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) \hspace{0.15cm} = \hspace{0.15cm} {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 0}^{I} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm},$$
Enclosed results table
$$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X) \hspace{0.15cm} = \hspace{0.15cm} {\rm E} \left [ {\rm log}_2 \hspace{0.1cm} \frac{P_Y(X)}{P_X(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{\mu = 0}^{\infty} P_Y(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_Y(\mu)}{P_X(\mu)} \hspace{0.05cm}.$$

If  $\log_2$  is used, the pseudo–unit  "bit"  must be added to the numerical value.

In the adjacent table, the Kullback–Leibler–distance  $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)$  (in  "bit")  between the binomial PMF  $P_X(\cdot)$  and some Poisson approximations  $P_Y(\cdot)$  $($with five different rates $\lambda)$  is entered.

  • The respective entropy  $H(Y)$, which also depends on the rate  $\lambda$,  is given in the first row.
  • The columns for  $\lambda = 1$  are to be completed in subtasks  (3)  and (4) .
  • In subtask  (6)  these results are to be interpreted.



Hints:

$$A\hspace{0.05cm}' = 0.4096 \cdot {\rm lg} \hspace{0.1cm} \frac{0.4096}{0.3679} + 0.2048 \cdot {\rm lg} \hspace{0.1cm} \frac{0.2048}{0.1839} + 0.0512 \cdot {\rm lg} \hspace{0.1cm} \frac{0.0512}{0.0613} + 0.0064 \cdot {\rm lg} \hspace{0.1cm} \frac{0.0064}{0.0153} + 0.0003 \cdot {\rm lg} \hspace{0.1cm} \frac{0.0003}{0.0031} \hspace{0.05cm},$$
$$B\hspace{0.05cm}' = 0.1839 \cdot {\rm lg} \hspace{0.1cm} (0.1839) + 0.0613 \cdot {\rm lg} \hspace{0.1cm} (0.0613) + 0.0153 \cdot {\rm lg} \hspace{0.1cm} (0.0153) + 0.0031 \cdot {\rm lg} \hspace{0.1cm} (0.0031) + 0.0005 \cdot {\rm lg} \hspace{0.1cm} (0.0005) + 0.0001 \cdot {\rm lg} \hspace{0.1cm} (0.0001)$$
$$\Rightarrow \hspace{0.3cm} A\hspace{0.05cm}' \hspace{0.15cm} \underline {= 0.021944} \hspace{0.05cm},\hspace{0.5cm} B\hspace{0.05cm}' \hspace{0.15cm} \underline {= -0.24717} \hspace{0.05cm}.$$


Questions

1

What are the characteristics of the present binomial distribution?   Hint:  Enter (at most) one decimal place.

$I \hspace{0.47cm} = \ $

$p \hspace{0.47cm} = \ $

$m_x \ = \ $

$\sigma^2_x \hspace{0.25cm} = \ $

2

Which Kullback–Leibler distance should be used here?

Neither of the two distances is applicable.
$D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)$  is more suitable
$D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X)$  is more suitable
Both Kullback–Leibler distances are applicable.

3

Calculate the appropriate Kullback–Leibler distance  $($abbreviated here as  $D)$  for  $\lambda = 1$.  Consider the auxiliary quantity  $A\hspace{0.05cm}'$.

$D \ = \ $

$\ \rm bit$

4

Calculate the entropy  $H(Y)$  of the Poisson approximation with rate  $\lambda = 1$.  Consider the auxiliary quantity  $B\hspace{0.05cm}'$.

$H(Y) \ = \ $

$\ \rm bit$

5

Which of the following statements are true?

In the  $H(Y)$  calculation, all terms have the same sign.
In the   $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y)$  calculation all terms have the same sign.

6

How do you interpret the completed results table?

According Kullback–Leibler distance, you should choose  $\lambda = 1$.
$\lambda = 1$  guarantees the best approximation  $H(Y) ≈ H(X)$.


Solution

(1)  With the binomial distribution, all probabilities are  ${\rm Pr}(X > I) = 0$   ⇒   $\underline{I = 5}$.  Thus, for the probability that  $X =I = 5$,  we get:

$${\rm Pr} (X = 5) = {5 \choose 5} \cdot p^{5} = p^{5} \approx 0.0003 \hspace{0.05cm}.$$

Thus one obtains for

  • the characteristic probability:   $p= (0.0003)^{1/5} = 0.1974 \hspace{0.15cm} \underline {\approx 0.2}\hspace{0.05cm},$
  • the linear mean (expected value):   $m_X = I \cdot p \hspace{0.15cm} \underline {= 1}\hspace{0.05cm},$
  • the variance:   $\sigma_X^2 = I \cdot p \cdot (1-p) \hspace{0.15cm} \underline {= 0.8}\hspace{0.05cm}.$



(2)  Proposed solution 2 is correct:

  • Using  $D(P_Y \hspace{0.05cm}|| \hspace{0.05cm} P_X)$  would always result in an infinite value regardless of  $λ$  since for  $\mu ≥ 6$ :
$$P_X (X = \mu) = 0 \hspace{0.05cm},\hspace{0.3cm}P_Y (Y = \mu) \ne 0 \hspace{0.05cm}.$$
  • Even though the probabilities  $P_Y (Y = \mu)$  become very small for large  $μ$  they are still  "infinitely larger"  than  $P_X (X = \mu)$.



(3)  We use the first Kullback–Leibler distance:

$$D = D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) =\hspace{0.2cm} \sum_{\mu = 0}^{5} P_X(\mu) \cdot {\rm log}_2 \hspace{0.1cm} \frac{P_X(\mu)}{P_Y(\mu)} \hspace{0.05cm}.$$
  • Using the logarithm of base ten  $(\lg)$,  we obtain for the Poisson approximation with  $\lambda = 1$:
$$D \hspace{0.05cm}' = 0.3277 \cdot {\rm lg} \hspace{0.1cm} \frac{0.3277}{0.3679} + A \hspace{0.05cm}' = -0.016468 + 0.021944 = 0.005476 \hspace{0.05cm}.$$
  • After converting to the logarithm of base two  $(\log_2)$ , we finally obtain:
$$D = D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) = \frac{0.005476}{{\rm lg} \hspace{0.1cm}(2)} \hspace{0.15cm} \underline {\approx 0.0182\ {\rm (bit)}}\hspace{0.05cm}.$$


(4)  Using the logarithm of base ten, the entropy of the Poisson approximation  $(\lambda = 1)$:

$$H\hspace{0.05cm}'(Y) = -{\rm E} \left [{\rm lg} \hspace{0.1cm} {P_Y(Y)} \right ] = -2 \cdot 0.3679 \cdot {\rm lg} \hspace{0.1cm} (0.3679) - B\hspace{0.05cm}' = 0.31954 + 0.24717 = 0.56126.$$
  • Converting to  "bit"  gives the result we are looking for:
$$H(Y) = \frac{0.56126}{{\rm lg} \hspace{0.1cm}(2)} \hspace{0.15cm} \underline {= 1.864\ {\rm (bit)}} \hspace{0.05cm}.$$


(5)  Correct is statement 1.  In the numerical calculation of the Kullback–Leibler distance

  • the contribution of the  $μ$–th term is positive,  if  $P_Y(\mu) > P_X(\mu)$,
  • the contribution of the  $μ$–th term is negative,  if  $P_Y(\mu) < P_X(\mu)$.


Kullback–Leibler distance and entropy


(6)  Proposed solution 1 is correct:

  • It can also be seen from the graph that  $D(P_X \hspace{0.05cm}|| \hspace{0.05cm} P_Y) =0.0182$  bit  is not undercut by any  $λ$–value other than  $λ = 1$  (green crosses).
  • Furthermore, one can see from this plot that a better entropy approximation is achieved with  $λ = 0.9$  than with  $λ = 1$  (blue circles):
$$H(Y) = 1.795\ {\rm bit} \hspace{0.15cm}\approx \hspace{0.15cm} H(X) = 1.793\ {\rm bit}\hspace{0.05cm}.$$
The second proposed solution is therefore wrong.
  • With  $λ = 1$ , the  linear means  of the two random variables coincide:
$$m_X = m_Y= 1.$$
  • With  $λ = 0.9$ the  second moments  agree:
$$m_X + \sigma_X^2 = m_Y + \sigma_Y^2= 1.8.$$

Whether this statement is relevant, we leave undecided. 

Because:   Due to the steady increase of   $H(Y)$  with increasing  $λ$ , it is clear that for some  $λ$–value   $H(Y) = H(X)$  must indeed hold.