Difference between revisions of "Information Theory/Differential Entropy"

From LNTwww
 
(64 intermediate revisions by 6 users not shown)
Line 1: Line 1:
 
   
 
   
 
{{Header
 
{{Header
|Untermenü=Wertkontinuierliche Informationstheorie
+
|Untermenü=Information Theory for Continuous Random Variables
 
|Vorherige Seite=Anwendung auf die Digitalsignalübertragung
 
|Vorherige Seite=Anwendung auf die Digitalsignalübertragung
|Nächste Seite=AWGN–Kanalkapazität bei wertkontinuierlichem Eingang
+
|Nächste Seite=AWGN_Channel_Capacity_for_Continuous-Valued_Input
 
}}
 
}}
  
== # ÜBERBLICK ZUM VIERTEN HAUPTKAPITEL # ==
+
== # OVERVIEW OF THE FOURTH MAIN CHAPTER # ==
 
<br>
 
<br>
Im letzten Kapitel dieses Buches werden die bisher für den wertdiskreten Fall definierten informationstheoretischen Größen derart adaptiert, dass sie auch für wertkontinuierliche Zufallsgrößen angewandt werden können.  
+
In the last chapter of this book,&nbsp; the information-theoretical quantities defined so far for the discrete case are adapted in such a way that they can also be applied to continuous random quantities.
*Aus der Entropie&nbsp; $H(X)$&nbsp; für die wertdiskrete Zufallsgröße&nbsp; $X$&nbsp; wird so zum Beispiel im wertkontinuierlichen Fall die differentielle Entropie&nbsp; $h(X)$.  
+
*For example,&nbsp; the entropy&nbsp; $H(X)$&nbsp; for the discrete random variable&nbsp; $X$&nbsp; becomes the &nbsp;&raquo;differential entropy&laquo;&nbsp; $h(X)$&nbsp; in the continuous case.
*Während&nbsp; $H(X)$&nbsp; die &bdquo;Unsicherheit&rdquo; hinsichtlich der diskreten Zufallsgröße&nbsp; $X$&nbsp; angibt, kann man im kontinuierlichen Fall&nbsp; $h(X)$&nbsp; nicht in gleicher Weise interpretieren.
+
 +
*While&nbsp; $H(X)$&nbsp; indicates the &nbsp;&raquo;uncertainty&laquo;&nbsp; with regard to the discrete random variable&nbsp; $X$;&nbsp; in the continuous case&nbsp; $h(X)$&nbsp; cannot be interpreted in the same way.
  
  
Viele der im dritten Kapitel &bdquo;Information zwischen zwei wertdiskreten Zufallsgrößen &nbsp; &rArr; &nbsp; siehe&nbsp; [[Informationstheorie|Inhaltsverzeichnis]]&nbsp; für die herkömmliche Entropie hergeleiteten Zusammenhänge gelten auch für die differentielle Entropie.&nbsp; So kann auch für wertkontinuierliche Zufallsgrößen&nbsp; $X$&nbsp; und&nbsp; $Y$&nbsp; die differentielle Verbundentropie&nbsp; $h(XY)$&nbsp; angegeben werden und ebenso die beiden bedingten differentiellen Entropien&nbsp; $h(Y|X)$&nbsp; und&nbsp; $h(X|Y)$.
+
Many of the relationships derived in the third chapter &nbsp;&raquo;Information between two discrete random variables&laquo;&nbsp; for conventional entropy also apply to differential entropy. &nbsp; Thus,&nbsp; the differential joint entropy&nbsp; $h(XY)$&nbsp; can also be given for continuous random variables&nbsp; $X$&nbsp; and&nbsp; $Y$,&nbsp; and likewise also the two conditional differential entropies&nbsp; $h(Y|X)$&nbsp; and&nbsp; $h(X|Y)$.
  
  
Im Einzelnen werden in diesem Hauptkapitel behandelt:
+
In detail, this main chapter deals with
*die Besonderheiten wertkontinuierlicher Zufallsgrößen,
+
#the special features of &nbsp;&raquo;continuous random variables&laquo;,
*die Definition und Berechnung der differentiellen Entropie sowie deren Eigenschaften,
+
#the &nbsp;&raquo;definition and calculation of the differential entropy&laquo;&nbsp; as well as its properties,
*die Transinformation zwischen zwei wertkontinuierlichen Zufallsgrößen,
+
#the &nbsp;&raquo;mutual information&laquo;&nbsp; between two continuous random variables,
*die Kapazität des AWGN–Kanals und mehrerer solcher paralleler Gaußkanäle,
+
#the &nbsp;&raquo;capacity of the AWGN channel&laquo;&nbsp; and several such parallel Gaussian channels,
*das Kanalcodierungstheorem, eines der &bdquo;Highlights&rdquo; der Shannonschen Informationstheorie,
+
#the &nbsp;&raquo;channel coding theorem&laquo;,&nbsp; one of the highlights of Shannon's information theory,
*die AWGN–Kanalkapazität für wertdiskrete Eingangssinale (BPSK, QPSK).
+
#the &nbsp;&raquo;AWGN channel capacity&laquo;&nbsp; for discrete input&nbsp; $($BPSK,&nbsp; QPSK$)$.
  
  
  
  
==Eigenschaften wertkontinuierlicher Zufallsgrößen==   
+
==Properties of continuous random variables==   
 
<br>
 
<br>
Bisher wurden stets&nbsp; ''wertdiskrete Zufallsgrößen''&nbsp; der Form&nbsp; $X = \{x_1,\ x_2, \hspace{0.05cm}\text{...}\hspace{0.05cm} , x_μ, \text{...} ,\ x_M\}$&nbsp; betrachtet, die aus informationstheoretischer Sicht vollständig durch ihre&nbsp; [[Informationstheorie/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgrößen#Wahrscheinlichkeitsfunktion_und_Wahrscheinlichkeitsdichtefunktion|Wahrscheinlichkeitsfunktion]]&nbsp; (englisch:&nbsp; ''Probability Mass Function'', PMF)&nbsp; $P_X(X)$&nbsp; charakterisiert werden:
+
Up to now,&nbsp; "discrete random variables"&nbsp; of the form&nbsp; $X = \{x_1,\ x_2, \hspace{0.05cm}\text{...}\hspace{0.05cm} , x_μ, \text{...} ,\ x_M\}$&nbsp; have always been considered, which from an information-theoretical point of view are completely characterized by their&nbsp; [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables#Probability_mass_function_and_probability_density_function|"probability mass function"]]&nbsp; $\rm (PMF)$:
 
   
 
   
 
:$$P_X(X) = \big [ \hspace{0.1cm}  
 
:$$P_X(X) = \big [ \hspace{0.1cm}  
 
p_1, p_2, \hspace{0.05cm}\text{...} \hspace{0.15cm}, p_{\mu},\hspace{0.05cm} \text{...}\hspace{0.15cm}, p_M \hspace{0.1cm}\big ]  
 
p_1, p_2, \hspace{0.05cm}\text{...} \hspace{0.15cm}, p_{\mu},\hspace{0.05cm} \text{...}\hspace{0.15cm}, p_M \hspace{0.1cm}\big ]  
\hspace{0.3cm}{\rm mit} \hspace{0.3cm}  p_{\mu}= P_X(x_{\mu})= {\rm Pr}( X = x_{\mu})
+
\hspace{0.3cm}{\rm with} \hspace{0.3cm}  p_{\mu}= P_X(x_{\mu})= {\rm Pr}( X = x_{\mu})
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
Eine&nbsp; '''wertkontinuierliche Zufallsgröße'''&nbsp; kann dagegen zumindest in endlichen Intervallen – jeden beliebigen Wert annehmen:
+
A&nbsp; "continuous random variable",&nbsp; on the other hand, can assume any value at least in finite intervals:
* Aufgrund des nicht abzählbaren Wertevorrats ist in diesem Fall die Beschreibung durch eine Wahrscheinlichkeitsfunktion nicht möglich oder zumindest nicht sinnvoll:  
+
* Due to the uncountable supply of values, the description by a probability mass function is not possible in this case, or at least it does not make sense:
*Es ergäbe sich nämlich&nbsp; $M \to ∞$&nbsp; sowie&nbsp; $p_1 \to 0$,&nbsp; $p_2 \to 0$,&nbsp; usw.
+
*This would result in the symbol set size&nbsp; $M \to ∞$&nbsp; as well as probabilities&nbsp; $p_1 \to 0$,&nbsp; $p_2 \to 0$,&nbsp; etc.
  
  
Man verwendet zur Beschreibung wertkontinuierlicher Zufallsgrößen gemäß den Definitionen im Buch&nbsp; [[Stochastische Signaltheorie]]&nbsp; gleichermaßen:
+
For the description of continuous random variables, one uses equally according to the definitions in the book&nbsp; [[Theory of Stochastic Signals|"Theory of Stochastic Signals"]]:
  
[[File:P_ID2850__Inf_T_4_1_S1b.png|right|frame|WDF und VTF einer wertkontinuierlichen Zufallsgröße]]
+
[[File:EN_Inf_T_4_1_S1b.png|right|frame|PDF and CDF of a continuous random variable]]
  
* die&nbsp; [[Stochastische_Signaltheorie/Wahrscheinlichkeitsdichtefunktion_(WDF)|Wahrscheinlichkeitsdichtefunktion]]&nbsp; (WDF,&nbsp; englisch:&nbsp; ''Probability Density Function'', PDF):
+
* the&nbsp; [[Theory_of_Stochastic_Signals/Wahrscheinlichkeitsdichtefunktion_(WDF)|"probability density function"]]&nbsp; $\rm (PDF)$:
 
   
 
   
 
:$$f_X(x_0)= \lim_{{\rm \Delta}  x\to \rm 0}\frac{p_{{\rm \Delta} x}}{{\rm \Delta} x} = \lim_{{\rm \Delta}  x\to \rm 0}\frac{{\rm Pr} \{ x_0- {\rm \Delta} x/\rm 2 \le \it X \le x_{\rm 0} +{\rm \Delta} x/\rm 2\}}{{\rm \Delta}  x};$$
 
:$$f_X(x_0)= \lim_{{\rm \Delta}  x\to \rm 0}\frac{p_{{\rm \Delta} x}}{{\rm \Delta} x} = \lim_{{\rm \Delta}  x\to \rm 0}\frac{{\rm Pr} \{ x_0- {\rm \Delta} x/\rm 2 \le \it X \le x_{\rm 0} +{\rm \Delta} x/\rm 2\}}{{\rm \Delta}  x};$$
  
:In Worten: &nbsp; Der WDF–Wert bei&nbsp; $x_0$&nbsp; gibt die Wahrscheinlichkeit&nbsp; $p_{Δx}$&nbsp; an, dass&nbsp; $X$&nbsp; in einem (unendlich kleinen) Intervall der Breite&nbsp; $Δx$&nbsp; um&nbsp; $x_0$&nbsp; liegt, dividiert durch&nbsp; $Δx$; &nbsp; (beachten Sie  die Einträge in nebenstehender Grafik);
+
:In words: &nbsp; the PDF value at&nbsp; $x_0$&nbsp; gives the probability&nbsp; $p_{Δx}$&nbsp; that&nbsp; $X$&nbsp; lies in an (infinitely small) interval of width&nbsp; $Δx$&nbsp; around&nbsp; $x_0$&nbsp;, divided by&nbsp; $Δx$ &nbsp; (note the entries in the adjacent graph);
*den&nbsp; [[Stochastische_Signaltheorie/Erwartungswerte_und_Momente#Momentenberechnung_als_Scharmittelwert|Mittelwert]]&nbsp; (Moment erster Ordnung,&nbsp; englisch:&nbsp; ''Mean Value''&nbsp; bzw.&nbsp; ''Expectation Value''):
+
*the&nbsp; [[Theory_of_Stochastic_Signals/Expected_Values_and_Moments#Moment_calculation_as_ensemble_average|"mean value"]]&nbsp; (first moment):
 
   
 
   
 
:$$m_1 =  {\rm E}\big[ X \big]=  \int_{-\infty}^{+\infty} \hspace{-0.1cm} x \cdot f_X(x) \hspace{0.1cm}{\rm d}x  
 
:$$m_1 =  {\rm E}\big[ X \big]=  \int_{-\infty}^{+\infty} \hspace{-0.1cm} x \cdot f_X(x) \hspace{0.1cm}{\rm d}x  
 
\hspace{0.05cm};$$
 
\hspace{0.05cm};$$
  
*die&nbsp; [[Stochastische_Signaltheorie/Erwartungswerte_und_Momente#Einige_h.C3.A4ufig_benutzte_Zentralmomente|Varianz]]&nbsp; (Zentralmoment zweiter Ordnung,&nbsp; englisch:&nbsp; ''Variance''):
+
*the&nbsp; [[Theory_of_Stochastic_Signals/Expected_Values_and_Moments#Some_common_central_moments|"variance"]]&nbsp; (second central moment):
 
   
 
   
 
:$$\sigma^2 =  {\rm E}\big[(X- m_1 )^2 \big]=  \int_{-\infty}^{+\infty} \hspace{-0.1cm} (x- m_1 )^2 \cdot f_X(x- m_1 ) \hspace{0.1cm}{\rm d}x  
 
:$$\sigma^2 =  {\rm E}\big[(X- m_1 )^2 \big]=  \int_{-\infty}^{+\infty} \hspace{-0.1cm} (x- m_1 )^2 \cdot f_X(x- m_1 ) \hspace{0.1cm}{\rm d}x  
 
\hspace{0.05cm};$$
 
\hspace{0.05cm};$$
  
*die&nbsp; [[Stochastische_Signaltheorie/Verteilungsfunktion_(VTF)|Verteilungsfunktion]]&nbsp; (VTF, englisch:&nbsp; ''Cumulative Distribution Function'', CDF):
+
*the&nbsp; [[Theory_of_Stochastic_Signals/Cumulative_Distribution_Function|"cumulative distribution function"]]&nbsp; $\rm (CDF)$:
 
   
 
   
 
:$$F_X(x) = \int_{-\infty}^{x} \hspace{-0.1cm}f_X(\xi) \hspace{0.1cm}{\rm d}\xi  
 
:$$F_X(x) = \int_{-\infty}^{x} \hspace{-0.1cm}f_X(\xi) \hspace{0.1cm}{\rm d}\xi  
Line 66: Line 67:
 
  {\rm Pr}(X \le x)\hspace{0.05cm}.$$
 
  {\rm Pr}(X \le x)\hspace{0.05cm}.$$
  
Beachten Sie, dass sowohl die WDF–Fläche als auch der VTF–Endwert stets gleich&nbsp; $1$&nbsp; sind.
+
Note that both the PDF area and the CDF final value are always equal to&nbsp; $1$.
  
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
$\text{Nomenklaturhinweise zu WDF und VTF:}$
+
$\text{Nomenclature notes on PDF and CDF:}$
  
Wir verwenden in diesem Kapitel für eine&nbsp; '''Wahrscheinlichkeitsdichtefunktion'''&nbsp; die in der Literatur häufig verwendete Darstellungsform&nbsp; $f_X(x)$, wobei gilt:
+
We use in this chapter for a&nbsp; &raquo;'''probability density function'''&laquo;&nbsp; $\rm (PDF)$&nbsp;   the representation form&nbsp; $f_X(x)$&nbsp; often used in the literature, where holds:
*$X$&nbsp; bezeichnet die (wertdiskrete oder wertkontinuierliche) Zufallsgröße,
+
*$X$&nbsp; denotes the (discrete or continuous) random variable,
*$x$&nbsp; ist eine mögliche Realisierung von&nbsp; $X$ &nbsp; ⇒ &nbsp; $x ∈ X$.
 
  
 +
*$x$&nbsp; is a possible realization of&nbsp; $X$ &nbsp; ⇒ &nbsp; $x ∈ X$.
 +
 +
 +
Accordingly, we denote the&nbsp; &raquo;'''cumulative distribution function'''&laquo;&nbsp; $\rm (CDF)$&nbsp; of the random variable&nbsp; $X$&nbsp; by&nbsp; $F_X(x)$&nbsp; according to the following definition:
  
Entsprechend bezeichnen wir die&nbsp; '''Verteilungsfunktion''' (VTF) der Zufallsgröße $X$ mit&nbsp; $F_X(x)$&nbsp; entsprechend folgender Definition:
 
 
:$$F_X(x) = \int_{-\infty}^{x} \hspace{-0.1cm}f_X(\xi) \hspace{0.1cm}{\rm d}\xi  
 
:$$F_X(x) = \int_{-\infty}^{x} \hspace{-0.1cm}f_X(\xi) \hspace{0.1cm}{\rm d}\xi  
 
\hspace{0.2cm} = \hspace{0.2cm}
 
\hspace{0.2cm} = \hspace{0.2cm}
 
  {\rm Pr}(X \le x)\hspace{0.05cm}.$$
 
  {\rm Pr}(X \le x)\hspace{0.05cm}.$$
  
In anderen&nbsp; $\rm LNTwww$–Büchern schreiben wir oft, um nicht für eine Variable zwei Zeichen zu verbrauchen:
+
In other&nbsp; $\rm LNTwww$ books, we often write so as not to use up two characters for one variable:
*Für die WDF&nbsp; $f_x(x)$  &nbsp; ⇒ &nbsp; keine Unterscheidung zwischen Zufallsgröße und Realisiering, und
+
*For the PDF&nbsp; $f_x(x)$  &nbsp; ⇒ &nbsp; no distinction between random variable and realization.  
*für die VTF&nbsp; $F_x(r) = {\rm Pr}(x ≤ r)$ &nbsp; ⇒ &nbsp; hier benötigt man auf jeden Fall eine zweite Variable.
 
  
 +
*For the CDF&nbsp; $F_x(r) = {\rm Pr}(x ≤ r)$ &nbsp; ⇒ &nbsp; here one needs a second variable in any case.
  
Wir bitten, diese formale Ungenauigkeit zu entschuldigen.}}  
+
 
 +
We apologize for this formal inaccuracy.}}  
  
  
 
{{GraueBox|TEXT=
 
{{GraueBox|TEXT=
$\text{Beispiel 1:}$&nbsp; Wir betrachten nun mit der Gleichverteilung einen wichtigen Sonderfall.
+
$\text{Example 1:}$&nbsp; We now consider with the&nbsp; &raquo;'''uniform distribution'''&laquo;&nbsp; an important special case.
[[File:P_ID2849__Inf_T_4_1_S1.png|right|frame|Zwei Analogsignale als Beispiele für wertkontinuierliche Zufallsgrößen]]
 
*Die Grafik zeigt den Verlauf zweier gleichverteilter Größen, die alle Werte zwischen&nbsp; $1$&nbsp; und&nbsp; $5$&nbsp; $($Mittelwert $m_1 = 3)$&nbsp; mit gleicher Wahrscheinlichkeit annehmen kann.
 
*Links ist das Ergebnis eines Zufallsprozesses dargestellt, rechts ein deterministisches Signal („Sägezahn”) mit gleicher Amplitudenverteilung.
 
  
[[File:P_ID2870__Inf_A_4_1a.png|right|frame|WDF und VTF einer gleichverteilten Zufallsgröße]]
+
[[File:EN_Inf_T_4_1_S1.png|right|frame|Two analog signals as examples of continuous random variables]]
<br>Die&nbsp; ''Wahrscheinlichkeitsdichtefunktion''&nbsp; der Gleichverteilung hat den in der zweiten Grafik oben skizzierten Verlauf:
+
 
 +
*The graph shows the course of two uniformly distributed variables, which can assume all values between&nbsp; $1$&nbsp; and&nbsp; $5$&nbsp; $($mean value $m_1 = 3)$&nbsp; with equal probability.
 +
 
 +
*On the left is the result of a random process, on the right a deterministic signal with the same amplitude distribution.
 +
 
 +
[[File:P_ID2870__Inf_A_4_1a.png|right|frame|PDF and CDF of an uniformly distributed random variable]]
 +
 
 +
<br>The&nbsp; "probability density function"&nbsp; $\rm (PDF)$&nbsp; of the uniform distribution has the course sketched in the second graph above:
 
   
 
   
:$$f_X(x) = \left\{ \begin{array}{c} \hspace{0.25cm}(x_{\rm max} - x_{\rm min})^{-1} \\  1/2 \cdot (x_{\rm max} - x_{\rm min})^{-1} \\ \hspace{0.25cm} 0 \\  \end{array} \right.  \begin{array}{*{20}c}  {\rm{f\ddot{u}r} }  \\  {\rm{f\ddot{u}r} }  \\  {\rm{f\ddot{u}r} }  \\ \end{array}
+
:$$f_X(x) = \left\{ \begin{array}{c} \hspace{0.25cm}(x_{\rm max} - x_{\rm min})^{-1} \\  1/2 \cdot (x_{\rm max} - x_{\rm min})^{-1} \\ \hspace{0.25cm} 0 \\  \end{array} \right.  \begin{array}{*{20}c}  {\rm{for} }  \\  {\rm{for} }  \\  {\rm{for} }  \\ \end{array}
\begin{array}{*{20}l}  {x_{\rm min} < x < x_{\rm max},}  \\  x ={x_{\rm min} \hspace{0.1cm}{\rm und}\hspace{0.1cm}x = x_{\rm max},}  \\  x > x_{\rm max}. \\ \end{array}$$
+
\begin{array}{*{20}l}  {x_{\rm min} < x < x_{\rm max},}  \\  x ={x_{\rm min} \hspace{0.15cm}{\rm and}\hspace{0.15cm}x = x_{\rm max},}  \\  x > x_{\rm max}. \\ \end{array}$$
  
Es ergeben sich hier für den Mittelwert&nbsp; $m_1 ={\rm E}\big[X\big]$&nbsp; und die Varianz&nbsp; $σ^2={\rm E}\big[(X – m_1)^2\big]$&nbsp; folgende Gleichungen:
+
The following equations are obtained here for the mean&nbsp; $m_1 ={\rm E}\big[X\big]$&nbsp; and the variance&nbsp; $σ^2={\rm E}\big[(X – m_1)^2\big]$&nbsp; :
 
   
 
   
 
:$$m_1 = \frac{x_{\rm max} + x_{\rm min} }{2}\hspace{0.05cm}, $$  
 
:$$m_1 = \frac{x_{\rm max} + x_{\rm min} }{2}\hspace{0.05cm}, $$  
 
:$$\sigma^2 = \frac{(x_{\rm max} - x_{\rm min})^2}{12}\hspace{0.05cm}.$$
 
:$$\sigma^2 = \frac{(x_{\rm max} - x_{\rm min})^2}{12}\hspace{0.05cm}.$$
  
Unten dargestellt ist die&nbsp; ''Verteilungsfunktion''&nbsp; (VTF):
+
Shown below is the &nbsp; &raquo;'''cumulative distribution function'''&laquo;&nbsp; $\rm (CDF)$:
 
   
 
   
 
:$$F_X(x) = \int_{-\infty}^{x} \hspace{-0.1cm}f_X(\xi) \hspace{0.1cm}{\rm d}\xi  
 
:$$F_X(x) = \int_{-\infty}^{x} \hspace{-0.1cm}f_X(\xi) \hspace{0.1cm}{\rm d}\xi  
Line 112: Line 120:
 
  {\rm Pr}(X \le x)\hspace{0.05cm}.$$
 
  {\rm Pr}(X \le x)\hspace{0.05cm}.$$
  
*Diese ist für&nbsp; $x ≤ x_{\rm min}$ identisch Null, steigt danach linear an und erreicht bei&nbsp; $x = x_{\rm max}$&nbsp; den VTF–Endwert&nbsp; $1$.
+
*This is identically zero for&nbsp; $x ≤ x_{\rm min}$, increases linearly thereafter and reaches the CDF final value of &nbsp; $1$ at&nbsp; $x = x_{\rm max}$&nbsp;.
*Die Wahrscheinlichkeit, dass die Zufallgröße&nbsp; $X$&nbsp; einen Wert zwischen&nbsp; $3$&nbsp; und&nbsp; $4$&nbsp; annimmt, kann sowohl aus der WDF als auch aus der VTF ermittelt werden:
+
 
 +
*The probability that the random variable&nbsp; $X$&nbsp; takes on a value between&nbsp; $3$&nbsp; and&nbsp; $4$&nbsp; can be determined from both the PDF and the CDF:
 
:$${\rm Pr}(3 \le X \le 4) = \int_{3}^{4} \hspace{-0.1cm}f_X(\xi) \hspace{0.1cm}{\rm d}\xi  = 0.25\hspace{0.05cm}\hspace{0.05cm},$$
 
:$${\rm Pr}(3 \le X \le 4) = \int_{3}^{4} \hspace{-0.1cm}f_X(\xi) \hspace{0.1cm}{\rm d}\xi  = 0.25\hspace{0.05cm}\hspace{0.05cm},$$
 
:$${\rm Pr}(3 \le X \le 4) = F_X(4) - F_X(3) = 0.25\hspace{0.05cm}.$$
 
:$${\rm Pr}(3 \le X \le 4) = F_X(4) - F_X(3) = 0.25\hspace{0.05cm}.$$
  
Weiterhin ist zu beachten:
+
Furthermore, note:
*Das Ergebnis&nbsp; $X = 0$&nbsp; ist bei dieser Zufallsgröße ausgeschlossen &nbsp; ⇒  &nbsp; ${\rm  Pr}(X = 0) = 0$.
+
*The result&nbsp; $X = 0$&nbsp; is excluded for this random variable &nbsp; ⇒  &nbsp; ${\rm  Pr}(X = 0) = 0$.
*Das Ergebnis&nbsp; $X = 4$&nbsp; ist dagegen durchaus möglich.&nbsp; Trotzdem gilt auch hier&nbsp; ${\rm  Pr}(X = 4) = 0$.}}
 
  
==Entropie wertkontinuierlicher Zufallsgrößen nach Quantisierung  ==
+
*The result&nbsp; $X = 4$&nbsp;, on the other hand, is quite possible.&nbsp; Nevertheless,&nbsp; ${\rm  Pr}(X = 4) = 0 $&nbsp; also applies here.}}
 +
 
 +
==Entropy of continuous random variables after quantization ==
 
<br>
 
<br>
Wir betrachten nun eine wertkontinuierliche Zufallsgröße&nbsp; $X$&nbsp; im Bereich von&nbsp; $0 \le x \le 1$.
+
We now consider a continuous random variable&nbsp; $X$&nbsp; in the range&nbsp; $0 \le x \le 1$.
*Wir quantisieren die kontinuierliche Zufallsgröße&nbsp; $X$, um die bisherige Entropieberechnung weiter anwenden zu können.&nbsp; Die so entstehende diskrete (quantisierte) Größe nennen wir&nbsp; $Z$.
+
*We quantize this random variable&nbsp; $X$,&nbsp; in order to be able to further apply the previous entropy calculation.&nbsp; We call the resulting discrete (quantized) quantity&nbsp; $Z$.
*Die Quantisierungsstufenzahl sei&nbsp; $M$, so dass jedes Quantisierungsintervall&nbsp; $μ$&nbsp; bei der vorliegenden WDF die Breite&nbsp; ${\it Δ} = 1/M$&nbsp; aufweist.&nbsp; Die Intervallmitten bezeichnen wir mit&nbsp; $x_μ$.
+
 
*Die Wahrscheinlichkeit&nbsp; $p_μ = {\rm Pr}(Z = z_μ)$&nbsp; bezüglich&nbsp; $Z$&nbsp; ist gleich der Wahrscheinlichkeit, dass die kontinuierliche Zufallsgröße&nbsp; $X$&nbsp; einen Wert zwischen&nbsp; $x_μ - {\it Δ}/2$&nbsp; und&nbsp; $x_μ + {\it Δ}/2$&nbsp; besitzt.
+
*Let the number of quantization steps be&nbsp; $M$,&nbsp; so that each quantization interval&nbsp; $μ$&nbsp; has the width&nbsp; ${\it Δ} = 1/M$&nbsp; in the present PDF.&nbsp; We denote the interval centres by&nbsp; $x_μ$.
*Zunächst setzen wir&nbsp; $M = 2$&nbsp; und verdoppeln anschließend diesen Wert in jeder Iteration.&nbsp; Dadurch wird die Quantisierung zunehmend feiner.&nbsp; Im&nbsp; $n$–ten Versuch gilt dann&nbsp; $M = 2^n$&nbsp; und&nbsp; ${\it Δ} =2^{–n}$.
+
 
 +
*The probability&nbsp; $p_μ = {\rm Pr}(Z = z_μ)$&nbsp; with respect to&nbsp; $Z$&nbsp; is equal to the probability that the random variable&nbsp; $X$&nbsp; has a value between&nbsp; $x_μ - {\it Δ}/2$&nbsp; and&nbsp; $x_μ + {\it Δ}/2$.
 +
 
 +
*First we set&nbsp; $M = 2$&nbsp; and then double this value in each iteration.&nbsp; This makes the quantization increasingly finer.&nbsp; In the&nbsp; $n$th try,&nbsp;  then apply&nbsp; $M = 2^n$&nbsp; and&nbsp; ${\it Δ} =2^{–n}$.
  
  
 
{{GraueBox|TEXT=
 
{{GraueBox|TEXT=
$\text{Beispiel 2:}$&nbsp; Die Grafik zeigt die Ergebnisse der ersten drei Versuche für eine unsymmetrisch&ndash;dreieckförmige WDF&nbsp; $($zwischen&nbsp; $0$&nbsp; und&nbsp; $1)$:
+
$\text{Example 2:}$&nbsp; The graph shows the results of the first three trials for an asymmetrical triangular PDF&nbsp; $($betweeen&nbsp; $0$&nbsp; and&nbsp; $1)$:
[[File:P_ID2851__Inf_T_4_1_S2.png|right|frame|Entropiebestimmung der Dreieck–WDF nach Quantisierung]]
+
[[File:EN_Inf_T_4_1_S2.png|right|frame|Entropy determination of the triangular PDF after quantization]]
 
* $n = 1 \ ⇒  \ M = 2  \ ⇒  \ {\it Δ} = 1/2\text{:}$ &nbsp; &nbsp; $H(Z) = 0.811\ \rm  bit,$  
 
* $n = 1 \ ⇒  \ M = 2  \ ⇒  \ {\it Δ} = 1/2\text{:}$ &nbsp; &nbsp; $H(Z) = 0.811\ \rm  bit,$  
 
* $n = 2 \ ⇒  \ M = 4  \ ⇒  \ {\it Δ} = 1/4\text{:}$ &nbsp; &nbsp;  $H(Z) = 1.749\ \rm  bit,$
 
* $n = 2 \ ⇒  \ M = 4  \ ⇒  \ {\it Δ} = 1/4\text{:}$ &nbsp; &nbsp;  $H(Z) = 1.749\ \rm  bit,$
Line 138: Line 151:
  
  
 
+
Additionally, the following quantities can be taken from the graph, for example for&nbsp; ${\it Δ} = 1/8$:
 
+
*The interval centres are at   
Zudem können der Grafik noch folgende Größen entnommen werden, zum Beispiel für&nbsp; &nbsp;  ${\it Δ} = 1/8$:
 
*Die Intervallmitten liegen bei &nbsp;
 
 
:$$x_1 = 1/16,\ x_2 = 3/16,\text{ ...} \ ,\ x_8 = 15/16 $$  
 
:$$x_1 = 1/16,\ x_2 = 3/16,\text{ ...} \ ,\ x_8 = 15/16 $$  
 
:$$ ⇒ \ x_μ = {\it Δ} · (μ - 1/2).$$
 
:$$ ⇒ \ x_μ = {\it Δ} · (μ - 1/2).$$
  
*Die Intervallflächen ergeben sich zu &nbsp;  
+
*The interval areas result in &nbsp;  
 
:$$p_μ = {\it Δ} · f_X(x_μ)  ⇒  p_8 = 1/8 · (7/8+1)/2 = 15/64.$$
 
:$$p_μ = {\it Δ} · f_X(x_μ)  ⇒  p_8 = 1/8 · (7/8+1)/2 = 15/64.$$
*Damit erhält man für die Wahrscheinlichkeitsfunktion &nbsp; $P_Z(Z) = (1/64, \ 3/64, \ 5/64, \ 7/64, \ 9/64, \ 11/64, \ 13/64, \ 15/64)$.}}
+
*Thus, we obtain for the&nbsp; $\rm PMF$&nbsp; of the quantized random variable&nbsp;$Z$:
 +
:$$P_Z(Z) = (1/64, \ 3/64, \ 5/64, \ 7/64, \ 9/64, \ 11/64, \ 13/64, \ 15/64).$$}}
  
  
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
$\text{Fazit:}$&nbsp;  
+
$\text{Conclusion:}$&nbsp;  
Die Ergebnisse dieses Experiments interpretieren wir wie folgt:
+
We interpret the results of this experiment as follows:
*Die Entropie $H(Z)$ nimmt mit steigendem $M$ immer mehr zu.
+
#The entropy&nbsp; $H(Z)$&nbsp; becomes larger and larger as&nbsp; $M$&nbsp; increases.
*Der Grenzwert von $H(Z)$ für $M \to ∞ \ ⇒  \ {\it Δ} → 0$ ist unendlich.
+
#The limit of&nbsp; $H(Z)$&nbsp; for&nbsp; $M \to ∞ \ ⇒  \ {\it Δ} → 0$&nbsp; is infinite.
*Damit ist auch die Entropie $H(X)$ der wertkontinuierlichen Zufallsgröße $X$ unendlich groß.
+
#Thus, the entropy&nbsp; $H(X)$&nbsp; of the continuous random variable&nbsp; $X$&nbsp; is also infinite.
*Daraus folgt: &nbsp; '''Die bisherige Entropie–Definition versagt hier'''.}}  
+
#It follows: &nbsp; '''The previous definition of entropy fails for continuous random variables'''.}}  
  
  
Zur Verifizierung unseres empirischen Ergebnisses gehen wir von folgender Gleichung aus:
+
To verify our empirical result, we assume the following equation:
 
   
 
   
 
:$$H(Z) = \hspace{0.2cm} \sum_{\mu = 1}^{M} \hspace{0.2cm} p_{\mu} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{p_{\mu}}=  \hspace{0.2cm} \sum_{\mu = 1}^{M} \hspace{0.2cm} {\it \Delta} \cdot f_X(x_{\mu} ) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{{\it \Delta} \cdot f_X(x_{\mu} )}\hspace{0.05cm}.$$
 
:$$H(Z) = \hspace{0.2cm} \sum_{\mu = 1}^{M} \hspace{0.2cm} p_{\mu} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{p_{\mu}}=  \hspace{0.2cm} \sum_{\mu = 1}^{M} \hspace{0.2cm} {\it \Delta} \cdot f_X(x_{\mu} ) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{{\it \Delta} \cdot f_X(x_{\mu} )}\hspace{0.05cm}.$$
  
*Wir spalten nun $H(Z) = S_1 + S_2$ in zwei Summen auf:
+
*We now split&nbsp; $H(Z) = S_1 + S_2$&nbsp; into two summands:
 
   
 
   
 
:$$\begin{align*}S_1 & =  {\rm log}_2 \hspace{0.1cm} \frac{1}{\it \Delta}  \cdot  \hspace{0.2cm} \sum_{\mu = 1}^{M} \hspace{0.02cm} {\it \Delta} \cdot f_X(x_{\mu} ) \approx - {\rm log}_2 \hspace{0.1cm}{\it \Delta} \hspace{0.05cm},\\  
 
:$$\begin{align*}S_1 & =  {\rm log}_2 \hspace{0.1cm} \frac{1}{\it \Delta}  \cdot  \hspace{0.2cm} \sum_{\mu = 1}^{M} \hspace{0.02cm} {\it \Delta} \cdot f_X(x_{\mu} ) \approx - {\rm log}_2 \hspace{0.1cm}{\it \Delta} \hspace{0.05cm},\\  
Line 169: Line 181:
 
\hspace{0.2cm}  \int_{0}^{1} \hspace{0.05cm}  f_X(x) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.\end{align*}$$
 
\hspace{0.2cm}  \int_{0}^{1} \hspace{0.05cm}  f_X(x) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.\end{align*}$$
  
Die Näherung $S_1 ≈ -\log_2 {\it Δ}$ gilt exakt nur im Grenzfall ${\it Δ} → 0$. Die angegebene Näherung für $S_2$ gilt ebenfalls nur für kleine ${\it Δ} → {\rm d}x$, so dass man die Summe durch das Integral ersetzen kann.
+
*The approximation&nbsp; $S_1 ≈ -\log_2 {\it Δ}$&nbsp; applies exactly only in the borderline case&nbsp; ${\it Δ} → 0$.&nbsp;
 +
 
 +
*The given approximation for&nbsp; $S_2$&nbsp; is also only valid for small&nbsp; ${\it Δ} → {\rm d}x$,&nbsp; so that one should replace the sum by the integral.
 +
 
  
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
$\text{Verallgemeinerung:}$&nbsp;
+
$\text{Generalization:}$&nbsp;
Nähert man die wertkontinuierliche Zufallsgröße $X$ mit der WDF $f_X(x)$ durch eine wertdiskrete Zufallsgröße $Z$ an, indem man eine (feine) Quantisierung mit der Intervallbreite ${\it Δ}$ durchführt, so erhält man für die Entropie der Zufallsgröße $Z$:
+
If one approximates the continuous random variable&nbsp; $X$&nbsp; with the PDF&nbsp; $f_X(x)$&nbsp; by a discrete random variable&nbsp; $Z$&nbsp; by performing a (fine) quantization with the interval width&nbsp; ${\it Δ}$,&nbsp; one obtains for the entropy of the random variable&nbsp; $Z$:
 
:$$H(Z) \approx  - {\rm log}_2 \hspace{0.1cm}{\it \Delta} \hspace{0.2cm}+
 
:$$H(Z) \approx  - {\rm log}_2 \hspace{0.1cm}{\it \Delta} \hspace{0.2cm}+
 
\hspace{-0.35cm}  \int\limits_{\text{supp}(f_X)} \hspace{-0.35cm}  f_X(x) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x =  - {\rm log}_2 \hspace{0.1cm}{\it \Delta} \hspace{0.2cm} + h(X) \hspace{0.5cm}\big [{\rm in \hspace{0.15cm}bit}\big ] \hspace{0.05cm}.$$
 
\hspace{-0.35cm}  \int\limits_{\text{supp}(f_X)} \hspace{-0.35cm}  f_X(x) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x =  - {\rm log}_2 \hspace{0.1cm}{\it \Delta} \hspace{0.2cm} + h(X) \hspace{0.5cm}\big [{\rm in \hspace{0.15cm}bit}\big ] \hspace{0.05cm}.$$
  
Das Integral beschreibt die [[Informationstheorie/Differentielle_Entropie#Definition_und_Eigenschaften_der_differentiellen_Entropie|differentielle Entropie]] $h(X)$ der wertkontinuierlichen Zufallsgröße $X$. Für den Sonderfall ${\it Δ} = 1/M = 2^{-n}$ kann die obige Gleichung auch wie folgt geschrieben werden:
+
*The integral describes the&nbsp; [[Information_Theory/Differentielle_Entropie#Definition_and_properties_of_differential_entropy|"differential  entropy"]]&nbsp; $h(X)$&nbsp; of the continuous random variable&nbsp; $X$.&nbsp;
 +
 
 +
For the special case &nbsp; ${\it Δ} = 1/M = 2^{-n}$,&nbsp; the above equation can also be written as follows:
 
   
 
   
 
:$$H(Z) =  n + h(X) \hspace{0.5cm}\big [{\rm in \hspace{0.15cm}bit}\big ] \hspace{0.05cm}.$$
 
:$$H(Z) =  n + h(X) \hspace{0.5cm}\big [{\rm in \hspace{0.15cm}bit}\big ] \hspace{0.05cm}.$$
  
*Im Grenzfall ${\it Δ} → 0 \ ⇒ \ M → ∞ n → ∞$ ist auch die Entropie der wertkontinuierlichen Zufallsgröße unendlich groß: &nbsp; $H(X) → ∞$.
+
*In the borderline case&nbsp; ${\it Δ} → 0 \ ⇒ \ M → ∞ \ \ n → ∞$,&nbsp; the entropy of the continuous random variable is also infinite: &nbsp; $H(X) → ∞$.
*Auch bei kleinerem $n$ stellt diese Gleichung lediglich eine Näherung für $H(Z)$ dar, wobei die differentielle Entropie $h(X)$ der wertkontinuierlichen Größe als Korrekturfaktor dient.}}
+
*For each&nbsp; $n$&nbsp; the equation&nbsp; $H(Z) = n$&nbsp; is only an approximation,&nbsp; where the differential entropy&nbsp; $h(X)$&nbsp; of the continuous quantity serves as a correction factor.
 +
}}
  
  
 
{{GraueBox|TEXT=
 
{{GraueBox|TEXT=
$\text{Beispiel 3:}$&nbsp; Wir betrachten wie im $\text{Beispiel 2}$ eine Dreieck–WDF (zwischen $0$ und $1$). Deren differentielle Entropie ergibt sich, wie in der [[Aufgaben:4.2_Dreieckförmige_WDF| Aufgabe 4.2]] berechnet,  zu $h(X) = \hspace{0.05cm}-0.279 \ \rm bit$ .
+
$\text{Example 3:}$&nbsp; As in&nbsp; $\text{Example 2}$,&nbsp; we consider a asymmetrical triangular PDF &nbsp; $($between&nbsp; $0$&nbsp; and&nbsp; $1)$.&nbsp; Its differential entropy, as calculated in&nbsp; [[Aufgaben:Exercise_4.2:_Triangular_PDF|"Exercise 4.2"]]&nbsp; results in&nbsp;
* In der Tabelle ist die Entropie $H(Z)$ der mit $n$ Bit quantisierten Größe $Z$ angegeben.  
+
[[File:EN_Inf_T_4_1_S2c.png|right|frame|Entropy of the asymmetrical triangular PDF after quantization ]]
*Man erkennt bereits für $n = 3$ eine gute Übereinstimmung zwischen der Näherung (untere Zeile) und der exakten Berechnung (Zeile 2).
+
:$$h(X) = \hspace{0.05cm}-0.279 \ \rm bit.$$
[[File:P_ID2852__Inf_T_4_1_S2c.png|center|frame|Entropie der Dreieck–WDF nach Quantisierung ]]}}
+
 
 +
* The table shows the entropy&nbsp; $H(Z)$&nbsp; of the quantity&nbsp; $Z$&nbsp; quantized with&nbsp; $n$&nbsp; bits.
 +
 
 +
*Already fo&nbsp; $n = 3$&nbsp; one can see a good agreement between the approximation&nbsp; (lower row)&nbsp; and the exact calculation&nbsp; (row 2).
 +
 
 +
*For&nbsp; $n = 10$,&nbsp; the approximation will agree even better with the exact calculation&nbsp; (which is extremely time-consuming.
 +
}}
 
 
  
 
   
 
   
==Definition und Eigenschaften der differentiellen Entropie ==  
+
==Definition and properties of differential entropy ==  
 
<br>
 
<br>
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
$\text{Verallgemeinerung:}$&nbsp;
+
$\text{Generalization:}$&nbsp;
Die '''differentielle Entropie''' $h(X)$ einer wertkontinuierlichen Zufallsgröße $X$ lautet mit der Wahrscheinlichkeitsdichtefunktion $f_X(x)$:
+
The&nbsp; &raquo;'''differential entropy'''&laquo;&nbsp; $h(X)$&nbsp; of a continuous value random variable&nbsp; $X$&nbsp; with probability density function&nbsp; $f_X(x)$&nbsp; is:
 
   
 
   
 
:$$h(X) =  
 
:$$h(X) =  
 
\hspace{0.1cm} - \hspace{-0.45cm} \int\limits_{\text{supp}(f_X)} \hspace{-0.35cm}  f_X(x) \cdot {\rm log} \hspace{0.1cm} \big[ f_X(x) \big] \hspace{0.1cm}{\rm d}x  
 
\hspace{0.1cm} - \hspace{-0.45cm} \int\limits_{\text{supp}(f_X)} \hspace{-0.35cm}  f_X(x) \cdot {\rm log} \hspace{0.1cm} \big[ f_X(x) \big] \hspace{0.1cm}{\rm d}x  
\hspace{0.6cm}{\rm mit}\hspace{0.6cm} {\rm supp}(f_X) = \{ x\text{:} \ f_X(x) > 0 \}
+
\hspace{0.6cm}{\rm with}\hspace{0.6cm} {\rm supp}(f_X) = \{ x\text{:} \ f_X(x) > 0 \}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
Hinzugefügt werden muss jeweils eine Pseudo–Einheit:
+
A pseudo-unit must be added in each case:
*„nat” bei Verwendung von „ln” &nbsp; ⇒ &nbsp;  natürlicher Logarithmus,
+
*"nat" when using&nbsp; "ln" &nbsp; ⇒ &nbsp;  natural logarithm,
*„bit” bei Verwendung von „log<sub>2</sub> &nbsp; ⇒ &nbsp;  Logarithmus dualis.}}
+
 
 +
*"bit" when using&nbsp; "log<sub>2</sub>" &nbsp; ⇒ &nbsp;  binary logarithm.}}
  
  
Während für die (herkömmliche) Entropie einer wertdiskreten Zufallsgröße $X$ stets $H(X) ≥ 0$ gilt, kann die differentielle Entropie $h(X)$ einer wertkontinuierlichen Zufallsgröße auch negativ sein. Daraus ist bereits ersichtlich, dass $h(X)$ im Gegensatz zu $H(X)$ nicht als „Unsicherheit” interpretiert werden kann.
+
While the (conventional) entropy of a discrete random variable&nbsp; $X$&nbsp; is always&nbsp; $H(X) ≥ 0$&nbsp;, the differential entropy&nbsp; $h(X)$&nbsp; of a continuous random variable can also be negative.&nbsp; From this it is already evident that&nbsp; $h(X)$&nbsp;  in contrast to&nbsp; $H(X)$&nbsp; cannot be interpreted as "uncertainty".
  
[[File:P_ID2854__Inf_T_4_1_S3a_neu.png|right|frame|WDF einer gleichverteilten Zufallsgröße]]
+
[[File:P_ID2854__Inf_T_4_1_S3a_neu.png|right|frame|PDF of an uniform distributed random variable]]
 
{{GraueBox|TEXT=
 
{{GraueBox|TEXT=
$\text{Beispiel 4:}$&nbsp;  
+
$\text{Example 4:}$&nbsp;  
Die Grafik zeigt die Wahrscheinlichkeitsdichte einer zwischen $x_{\rm min}$ und $x_{\rm max}$ gleichverteilten Zufallsgröße $X$. Für deren differentielle Entropie erhält man in „nat”:
+
The upper graph shows the&nbsp; $\rm PDF$&nbsp; of a random variable&nbsp; $X$,&nbsp; which is uniform distributed between&nbsp; $x_{\rm min}$&nbsp; and&nbsp; $x_{\rm max}$.  
 +
*For its differential entropy one obtains in&nbsp; "nat":
 
    
 
    
 
:$$\begin{align*}h(X)  & =    -  \hspace{-0.18cm}\int\limits_{x_{\rm min} }^{x_{\rm max} } \hspace{-0.28cm}  \frac{1}{x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} } \cdot {\rm ln} \hspace{0.1cm}\big [ \frac{1}{x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} }\big ] \hspace{0.1cm}{\rm d}x \\ & =   
 
:$$\begin{align*}h(X)  & =    -  \hspace{-0.18cm}\int\limits_{x_{\rm min} }^{x_{\rm max} } \hspace{-0.28cm}  \frac{1}{x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} } \cdot {\rm ln} \hspace{0.1cm}\big [ \frac{1}{x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} }\big ] \hspace{0.1cm}{\rm d}x \\ & =   
 
{\rm ln} \hspace{0.1cm} \big[ {x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} }\big ]  \cdot \big [ \frac{1}{x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} } \big ]_{x_{\rm min} }^{x_{\rm max} }={\rm ln} \hspace{0.1cm} \big[ {x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} } \big]\hspace{0.05cm}.\end{align*} $$
 
{\rm ln} \hspace{0.1cm} \big[ {x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} }\big ]  \cdot \big [ \frac{1}{x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} } \big ]_{x_{\rm min} }^{x_{\rm max} }={\rm ln} \hspace{0.1cm} \big[ {x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} } \big]\hspace{0.05cm}.\end{align*} $$
  
Die Gleichung für die differentielle Entropie in „bit” lautet: &nbsp;  $h(X) = \log_2 \big[x_{\rm max} – x_{ \rm min} \big]$.
+
*The equation for the differential entropy in "bit" is: &nbsp;   
 +
:$$h(X) = \log_2 \big[x_{\rm max} – x_{ \rm min} \big].$$  
  
Die untere Grafik zeigt anhand einiger Beispiele die numerische Auswertung des obigen Ergebnisses.
+
[[File:P_ID2855__Inf_T_4_1_S3b_neu.png|left|frame|$h(X)$&nbsp; for different rectangular density functions &nbsp; &rArr; &nbsp; uniform distributed random variables]]
[[File:P_ID2855__Inf_T_4_1_S3b_neu.png|center|frame|$h(X)$ für verschiedene rechteckförmige Dichtefunktionen]] }}
+
<br><br><br><br>The graph on the left shows the numerical evaluation of the above result by means of some examples.
 +
}}
  
 
   
 
   
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
 
$\text{Interpretation:}$&nbsp;
 
$\text{Interpretation:}$&nbsp;
Aus den sechs Skizzen im letzten Beispiel lassen sich wichtige Eigenschaften der differentiellen Entropie $h(X)$ ablesen:
+
From the six sketches in the last example, important properties of the differential entropy&nbsp; $h(X)$&nbsp; can be read:
*Die differentielle Entropie wird durch eine WDF–Verschiebung (um $k$) nicht verändert:
+
*The differential entropy is not changed by a PDF shift &nbsp; $($by&nbsp; $k)$&nbsp;:
:$$h(X + k) = h(X) \hspace{0.2cm}\Rightarrow \hspace{0.2cm} \text{Beispielsweise gilt} \ \ h_3(X) = h_4(X) = h_5(X)  \hspace{0.05cm}.$$
+
:$$h(X + k) = h(X) \hspace{0.2cm}\Rightarrow \hspace{0.2cm} \text{For example:} \ \ h_3(X) = h_4(X) = h_5(X)  \hspace{0.05cm}.$$
  
* $h(X)$ ändert sich durch Stauchung/Spreizung der WDF um den Faktor $k ≠ 0$ wie folgt:
+
* $h(X)$&nbsp; changes by compression/spreading of the PDF by the factor&nbsp; $k ≠ 0$&nbsp; as follows:
 
:$$h( k\hspace{-0.05cm} \cdot \hspace{-0.05cm}X) = h(X) + {\rm log}_2 \hspace{0.05cm} \vert k \vert \hspace{0.2cm}\Rightarrow \hspace{0.2cm}
 
:$$h( k\hspace{-0.05cm} \cdot \hspace{-0.05cm}X) = h(X) + {\rm log}_2 \hspace{0.05cm} \vert k \vert \hspace{0.2cm}\Rightarrow \hspace{0.2cm}
  \text{Beispielsweise gilt} \ \ h_6(X) = h_5(AX) = h_5(X) + {\rm log}_2 \hspace{0.05cm} (A) =
+
  \text{For example:} \ \ h_6(X) = h_5(AX) = h_5(X) + {\rm log}_2 \hspace{0.05cm} (A) =
 
{\rm log}_2 \hspace{0.05cm} (2A)   
 
{\rm log}_2 \hspace{0.05cm} (2A)   
 
\hspace{0.05cm}.$$}}
 
\hspace{0.05cm}.$$}}
  
  
Des Weiteren gelten viele der im Kapitel [[Informationstheorie/Verschiedene_Entropien_zweidimensionaler_Zufallsgrößen|Verschiedene Entropien zweidimensionaler Zufallsgrößen]] für den wertdiskreten Fall hergeleitete Gleichungen auch für wertkontinuierliche Zufallsgrößen.  
+
Many of the equations derived in the chapter&nbsp; [[Information_Theory/Verschiedene_Entropien_zweidimensionaler_Zufallsgrößen|"Different entropies of two-dimensional random variables"]]&nbsp; for the discrete case also apply to continuous random variables.
  
Aus der folgenden Zusammenstellung erkennt man, dass oft nur das „$H$” durch ein „$h$” sowie die Wahrscheinlichkeitsfunktion (englische Abkürzung: ''PMF'') durch die entsprechende Wahrscheinlichkeitsdichtefunktion (WDF bzw. PDF) zu ersetzen ist.
+
From the following compilation one can see that often only the (large) &nbsp;$H$&nbsp; has to be replaced by a (small) &nbsp;$h$&nbsp; as well as the probability mass function&nbsp; $\rm (PMF)$&nbsp; by the corresponding probability density function&nbsp; $\rm (PDF)$&nbsp;.
  
* '''Bedingte differentielle Entropie''' (englisch: ''Conditional Differential Entropy''):
+
* &raquo;'''Conditional Differential Entropy'''&laquo;:
 
    
 
    
 
:$$H(X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y) = {\rm E} \hspace{-0.1cm}\left [ {\rm log} \hspace{0.1cm}\frac{1}{P_{\hspace{0.03cm}X \mid \hspace{0.03cm} Y} (X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y)}\right ]=\hspace{-0.04cm} \sum_{(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm}(\hspace{-0.03cm}P_{XY}\hspace{-0.08cm})}  
 
:$$H(X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y) = {\rm E} \hspace{-0.1cm}\left [ {\rm log} \hspace{0.1cm}\frac{1}{P_{\hspace{0.03cm}X \mid \hspace{0.03cm} Y} (X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y)}\right ]=\hspace{-0.04cm} \sum_{(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm}(\hspace{-0.03cm}P_{XY}\hspace{-0.08cm})}  
Line 252: Line 280:
 
  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y\hspace{0.05cm}.$$
 
  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y\hspace{0.05cm}.$$
  
* '''Differentielle Verbundentropie''' (englisch: ''Joint Differential Entropy''):
+
* &raquo;'''Joint Differential Entropy'''&laquo;:
 
    
 
    
 
:$$H(XY) = {\rm E} \left [ {\rm log} \hspace{0.1cm} \frac{1}{P_{XY}(X, Y)}\right ] =\hspace{-0.04cm} \sum_{(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm}(\hspace{-0.03cm}P_{XY}\hspace{-0.08cm})}  
 
:$$H(XY) = {\rm E} \left [ {\rm log} \hspace{0.1cm} \frac{1}{P_{XY}(X, Y)}\right ] =\hspace{-0.04cm} \sum_{(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm}(\hspace{-0.03cm}P_{XY}\hspace{-0.08cm})}  
Line 261: Line 289:
 
  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y\hspace{0.05cm}.$$
 
  \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y\hspace{0.05cm}.$$
  
* '''Kettenregel''' der differentiellen Entropie:
+
* &raquo;'''Chain rule'''&laquo;&nbsp; of differential entropy:
 
    
 
    
 
:$$H(X_1\hspace{0.05cm}X_2\hspace{0.05cm}\text{...} \hspace{0.1cm}X_n) =\sum_{i = 1}^{n}
 
:$$H(X_1\hspace{0.05cm}X_2\hspace{0.05cm}\text{...} \hspace{0.1cm}X_n) =\sum_{i = 1}^{n}
Line 276: Line 304:
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
* '''Kullback–Leibler–Distanz''' zwischen den Zufallsgrößen $X$ und $Y$:
+
* &raquo;'''Kullback–Leibler distance'''&laquo;&nbsp; between the random variables&nbsp; $X$&nbsp; and&nbsp; $Y$:
 
  
 
  
 
:$$D(P_X \hspace{0.05cm} ||  \hspace{0.05cm}P_Y) = {\rm E} \left [ {\rm log} \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm}(\hspace{-0.03cm}P_{X})\hspace{-0.8cm}}
 
:$$D(P_X \hspace{0.05cm} ||  \hspace{0.05cm}P_Y) = {\rm E} \left [ {\rm log} \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm}(\hspace{-0.03cm}P_{X})\hspace{-0.8cm}}
Line 286: Line 314:
 
  \hspace{0.05cm}.$$
 
  \hspace{0.05cm}.$$
  
==Differentielle Entropie einiger spitzenwertbegrenzter Zufallsgrößen ==
+
==Differential entropy of some peak-constrained random variables ==
 
<br>
 
<br>
Die Tabelle zeigt die Ergebnisse für drei beispielhafte Wahrscheinlichkeitsdichtefunktionen $f_X(x)$. Diese sind alle spitzenwertbegrenzt, das heißt, es gilt jeweils $|X| ≤ A$. <br>Bei ''Spitzenwertbegrenzung'' kann man die differentielle Entropie stets wie folgt darstellen:
+
[[File:EN_Inf_T_4_1_S4a.png|right|frame|Differential entropy of peak-constrained random variables]]
[[File:P_ID2867__Inf_A_4_1.png|right|frame|Differentielle Entropie spitzenwertbegrenzter Zufallsgrößen]]
+
The table shows the results regarding the differential entropy for three exemplary probability density functions&nbsp; $f_X(x)$.&nbsp; These are all peak-constrained, i.e. &nbsp; $|X| ≤ A$ applies in each case.  
 +
 
 +
*With&nbsp; "peak constraint"&nbsp;, the differential entropy can always be represented as follows:
 
:$$h(X) =  {\rm log}\,\, ({\it \Gamma}_{\rm A} \cdot A).$$  
 
:$$h(X) =  {\rm log}\,\, ({\it \Gamma}_{\rm A} \cdot A).$$  
  
Das Argument ${\it \Gamma}_A · A$ ist unabhängig davon, welchen Logarithmus man verwendet. Anzufügen ist
+
*Add the pseudo-unit&nbsp; "nat"&nbsp; when using&nbsp; $\ln$&nbsp; and the pseudo-unit&nbsp; "bit"&nbsp; when using&nbsp; $\log_2$.
*bei Verwendung von $\ln$ ist die Pseudo–Einheit „nat”,
+
 
*bei Verwendung von $\log_2$ ist die Pseudo–Einheit „bit”.
+
*${\it \Gamma}_{\rm A}$&nbsp; depends solely on the PDF form and applies only to&nbsp; "peak limitation" &nbsp; &rArr; &nbsp; German:&nbsp; "Amplitudenbegrenzung"  &nbsp; &rArr; &nbsp; Index&nbsp; $\rm A$.
 +
 
 +
*A uniform distribution in the range&nbsp; $|X| ≤ 1$&nbsp; yields&nbsp; $h(X) = 1$&nbsp; bit, a second one in the range&nbsp; $|Y| ≤ 4$&nbsp; to&nbsp; $h(Y) = 3$&nbsp; bit.
 
<br clear=all>
 
<br clear=all>
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
 
$\text{Theorem:}$&nbsp;  
 
$\text{Theorem:}$&nbsp;  
Unter der Nebenbedingung '''Spitzenwertbegrenzung''' (englisch: ''Peak Constraint'') &nbsp; ⇒ &nbsp; also WDF $f_X(x) = 0$ &nbsp;für&nbsp; $ \vert x \vert > A$  &nbsp; – &nbsp;  führt die '''Gleichverteilung''' zur maximalen differentiellen Entropie:
+
Under the &nbsp; &raquo;'''peak contstraint'''&laquo;&nbsp; ⇒ &nbsp; i.e. PDF&nbsp; $f_X(x) = 0$ &nbsp;for&nbsp; $ \vert x \vert > A$  &nbsp; – &nbsp;  the&nbsp; &raquo;'''uniform distribution'''&laquo;&nbsp; leads to the maximum differential entropy:
 
:$$h_{\rm max}(X) = {\rm log} \hspace{0.1cm} (2A)\hspace{0.05cm}.$$
 
:$$h_{\rm max}(X) = {\rm log} \hspace{0.1cm} (2A)\hspace{0.05cm}.$$
Hier ist die geeignete Kenngröße ${\it \Gamma}_{\rm A} = 2$ maximal.
+
Here,&nbsp; the appropriate parameter&nbsp; ${\it \Gamma}_{\rm A} = 2$&nbsp; is maximal.
Sie finden den [[Informationstheorie/Differentielle_Entropie#Beweis:_Maximale_differentielle_Entropie_bei_Spitzenwertbegrenzung|Beweis]] am Ende dieses Kapitels.}}
+
You will find the&nbsp; [[Information_Theory/Differential_Entropy#Proof:_Maximum_differential_entropy_with_peak_constraint|$\text{proof}$]]&nbsp; at the end of this chapter.}}
  
  
Das Theorem bedeutet gleichzeitig, dass bei jeder anderen spitzenwertbegrenzten WDF (außer der Gleichverteilung) der Kennparameter ${\it \Gamma}_{\rm A} < 2$ ist.
+
The theorem simultaneously means that for any other peak-constrained PDF&nbsp; (except the uniform distribution)&nbsp; the characteristic parameter&nbsp; ${\it \Gamma}_{\rm A} < 2$.
*Für die symmetrische Dreieckverteilung ergibt sich nach obiger Tabelle ${\it \Gamma}_{\rm A} = \sqrt{\rm e} ≈ 1.649$.
+
*For the symmetric triangular distribution, the above table gives&nbsp; ${\it \Gamma}_{\rm A} = \sqrt{\rm e} ≈ 1.649$.
*Beim einseitigen Dreieck (zwischen $0$ und $A$) ist demgegenüber ${\it \Gamma}_{\rm A}$ nur halb so groß.
+
*In contrast, for the one-sided triangle&nbsp; $($between&nbsp; $0$&nbsp; and&nbsp; $A)$&nbsp; &nbsp; ${\it \Gamma}_{\rm A}$&nbsp; is only half as large.
*Auch für jedes andere Dreieck (Breite $A$, Spitze beliebig zwischen $0$ und $A$) gilt ${\it \Gamma}_{\rm A} ≈ 0.824$.
+
*For every other triangle&nbsp; $($width&nbsp; $A$,&nbsp; arbitrary peak between&nbsp; $0$&nbsp; and&nbsp; $A)$&nbsp; &nbsp; ${\it \Gamma}_{\rm A} ≈ 0.824$&nbsp; also applies.
  
  
Die jeweils zweite $h(X)$–Angabe und die Kenngröße ${\it \Gamma}_{\rm L}$ eignet sich dagegen für den Vergleich von Zufallsgrößen bei Leistungsbegrenzung, der im nächsten Abschnitt behandelt wird. Unter dieser Nebenbedingung ist zum Beispiel die symmetrische Dreieckverteilung $({\it \Gamma}_{\rm L} ≈ 16.31)$ besser als die Gleichverteilung ${\it \Gamma}_{\rm L} = 12)$.
+
The respective second&nbsp; $h(X)$ specification and the characteristic&nbsp; ${\it \Gamma}_{\rm L}$&nbsp; on the other hand, are suitable for the comparison of random variables with power constraints, which will be discussed in the next section.&nbsp; Under this constraint, e.g. the symmetric triangular distribution&nbsp; $({\it \Gamma}_{\rm L} ≈ 16.31)$&nbsp; is better than the uniform distribution&nbsp; $({\it \Gamma}_{\rm L} = 12)$.
 
 
 
 
  
==Differentielle Entropie einiger leistungsbegrenzter Zufallsgrößen ==   
+
==Differential entropy of some power-constrained random variables ==   
 
<br>
 
<br>
Die differentiellen Entropien $h(X)$ für drei beispielhafte Dichtefunktionen $f_X(x)$ ohne Begrenzung, die alle die gleiche Varianz $σ^2 = {\rm E}\big[|X -m_x|^2 \big]$  und damit gleiche  Streuung $σ$ aufweisen, sind der folgenden Tabelle zu entnehmen:
+
The differential entropies&nbsp; $h(X)$&nbsp; for three exemplary density functions&nbsp; $f_X(x)$&nbsp; without boundary, which all have the same variance&nbsp; $σ^2 = {\rm E}\big[|X -m_x|^2 \big]$&nbsp; and thus the same standard deviation&nbsp; $σ$&nbsp; through appropriate parameter selection, can be taken from the following table.&nbsp; Considered are:
 +
 
 +
[[File:EN_Inf_T_4_1_S5a_v5.png|right|frame|Differential entropy of power-constrained random variables]]
 +
*the&nbsp; [[Theory_of_Stochastic_Signals/Gaußverteilte_Zufallsgrößen|"Gaussian distribution"]],
 +
 
 +
*the&nbsp; [[Theory_of_Stochastic_Signals/Exponentially_Distributed_Random_Variables#Two-sided_exponential_distribution_-_Laplace_distribution|"Laplace distribution"]]&nbsp;  ⇒  &nbsp; a two-sided exponential distribution,
 +
 
 +
*the  (one-sided) &nbsp;  [[Theory_of_Stochastic_Signals/Exponentialverteilte_Zufallsgrößen#One-sided_exponential_distribution|"exponential distribution"]].
  
[[File:P_ID2873__Inf_T_4_1_S5a_neu.png|right|frame|Differentielle Entropie leistungsbegrenzter Zufallsgrößen]]
 
*die [[Stochastische_Signaltheorie/Gaußverteilte_Zufallsgrößen|Gaußverteilung]]'',
 
*die [[Stochastische_Signaltheorie/Exponentialverteilte_Zufallsgrößen#Zweiseitige_Exponentialverteilung_.E2.80.93_Laplaceverteilung|Laplaceverteilung]]  ⇒  eine zweiseitige Exponentialverteilung,
 
*die  (einseitige)  [[Stochastische_Signaltheorie/Exponentialverteilte_Zufallsgrößen#Einseitige_Exponentialverteilung|Exponentialverteilung]].
 
  
 +
The differential entropy can always be represented here as
 +
:$$h(X) = 1/2 \cdot {\rm log} \hspace{0.1cm} ({\it \Gamma}_{\rm L} \cdot \sigma^2).$$
 +
${\it \Gamma}_{\rm L}$&nbsp; depends solely on the PDF form and applies only to&nbsp; "power limitation" &nbsp; &rArr; &nbsp; German:&nbsp; "Leistungsbegrenzung"  &nbsp; &rArr; &nbsp; Index&nbsp; $\rm L$.
  
Die differentielle Entropie lässt sich bei allen diesen Beispielen als
+
The result differs only by the pseudo-unit
:$$h(X) = 1/2 \cdot {\rm log} \hspace{0.1cm} ({\it \Gamma}_{\rm L} \cdot \sigma^2)$$
+
*"nat" when using&nbsp; $\ln$&nbsp; or
darstellen. Das Ergebnis unterscheidet sich nur durch
+
*die Pseudo–Einheit „nat” bei Verwendung von $\ln$  
+
*"bit" when using&nbsp; $\log_2$.
*bzw. „bit” bei Verwendung von $\log_2$.
 
 
<br clear=all>
 
<br clear=all>
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
 
$\text{Theorem:}$&nbsp;  
 
$\text{Theorem:}$&nbsp;  
Unter der Nebenbedingung der '''Leistungsbegrenzung''' (englisch: ''Power Constraint'') führt die '''Gaußverteilung'''
+
Under the constraint of&nbsp; &raquo;'''power constraint'''&laquo;, the &raquo;'''Gaussian PDF'''&laquo;,
:$$f_X(x) = \frac{1}{\sqrt{2\pi  \sigma^2} } \cdot {\rm exp} \left [
+
:$$f_X(x) = \frac{1}{\sqrt{2\pi  \sigma^2} } \cdot {\rm e}^{
- \hspace{0.05cm}\frac{(x - m_1)^2}{2 \sigma^2}\right ]$$
+
- \hspace{0.05cm}{(x - m_1)^2}/(2 \sigma^2)},$$
unabhängig vom Mittelwert $m_1$ zur maximalen differentiellen Entropie:
+
leads to the maximum differential entropy,&nbsp; independent of the mean&nbsp; $m_1$:
 
:$$h(X) = 1/2 \cdot {\rm log} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2)\hspace{0.3cm}\Rightarrow\hspace{0.3cm}{\it \Gamma}_{\rm L} < 2π{\rm e} ≈ 17.08\hspace{0.05cm}.$$
 
:$$h(X) = 1/2 \cdot {\rm log} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2)\hspace{0.3cm}\Rightarrow\hspace{0.3cm}{\it \Gamma}_{\rm L} < 2π{\rm e} ≈ 17.08\hspace{0.05cm}.$$
Sie finden den [[Informationstheorie/Differentielle_Entropie#Beweis:_Maximale_differentielle_Entropie_bei_Leistungsbegrenzung|Beweis]] am Ende dieses Kapitels.}}
+
You will find the&nbsp; [[Information_Theory/Differential_Entropy#Proof:_Maximum_differential_entropy_with_power_constraint|"proof"]]&nbsp; at the end of this chapter.}}
  
  
Diese Aussage bedeutet gleichzeitig, dass für jede andere WDF als die Gaußverteilung die Kenngröße ${\it \Gamma}_{\rm L} < 2π{\rm e} ≈ 17.08$ sein wird. Beispielsweise ergibt sich der Kennwert
+
This statement means at the same time that for any PDF other than the Gaussian distribution, the characteristic value will be&nbsp; ${\it \Gamma}_{\rm L} < 2π{\rm e} ≈ 17.08$.&nbsp; For example, the characteristic value
*für die Dreieckverteilung  zu ${\it \Gamma}_{\rm L} = 6{\rm e} ≈ 16.31$,  
+
*for the triangular distribution to&nbsp; ${\it \Gamma}_{\rm L} = 6{\rm e} ≈ 16.31$,
*für die Laplaceverteilung zu ${\it \Gamma}_{\rm L} = 2{\rm e}^2 ≈ 14.78$, und
+
*für die Gleichverteilung zu $Γ_{\rm L} = 12$ .  
+
*for the Laplace distribution to&nbsp; ${\it \Gamma}_{\rm L} = 2{\rm e}^2 ≈ 14.78$, and
 +
 +
*for the uniform distribution to&nbsp; ${\it \Gamma}_{\rm L} = 12$ .  
  
==Beweis: Maximale differentielle Entropie bei Spitzenwertbegrenzung==  
+
==Proof: Maximum differential entropy with peak constraint==  
 
<br>
 
<br>
Unter der Nebenbedingung der Spitzenwertbegrenzung &nbsp; ⇒  &nbsp; $|X| ≤ A$ gilt für die differentielle Entropie:
+
Under the peak constraint &nbsp; ⇒  &nbsp; $|X| ≤ A$&nbsp; the differential entropy is:
 
:$$h(X) = \hspace{0.1cm}  \hspace{0.05cm} \int_{-A}^{+A} \hspace{0.05cm}  f_X(x) \cdot {\rm log} \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.$$
 
:$$h(X) = \hspace{0.1cm}  \hspace{0.05cm} \int_{-A}^{+A} \hspace{0.05cm}  f_X(x) \cdot {\rm log} \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.$$
  
Von allen möglichen Wahrscheinlichkeitsdichtefunktionen $f_X(x)$, die die Bedingung
+
Of all possible probability density functions&nbsp; $f_X(x)$ that satisfy the condition
 
:$$\int_{-A}^{+A} \hspace{0.05cm}  f_X(x)  \hspace{0.1cm}{\rm d}x = 1$$
 
:$$\int_{-A}^{+A} \hspace{0.05cm}  f_X(x)  \hspace{0.1cm}{\rm d}x = 1$$
erfüllen, ist nun diejenige Funktion $g_X(x)$ gesucht, die zur maximalen differentiellen Entropie $h(X)$ führt.  
+
we are now looking for the function&nbsp; $g_X(x)$&nbsp; that leads to the maximum differential entropy&nbsp; $h(X)$.  
  
Zur Herleitung benutzen wir das Verfahren der [https://de.wikipedia.org/wiki/Lagrange-Multiplikator Lagrange–Multiplikatoren]:
+
For derivation we use the&nbsp; [https://en.wikipedia.org/wiki/Lagrange_multiplier $&raquo;\text{Lagrange multiplier method}$&laquo;]:
*Wir definieren die Lagrange–Kenngröße $L$ in der Weise, dass darin sowohl $h(X)$ als auch die Nebenbedingung $|X| ≤ A$ enthalten sind:
+
*We define the Lagrangian parameter&nbsp; $L$&nbsp; in such a way that it contains both&nbsp; $h(X)$&nbsp; and the constraint&nbsp; $|X| ≤ A$&nbsp;:
 
:$$L= \hspace{0.1cm}  \hspace{0.05cm} \int_{-A}^{+A} \hspace{0.05cm}  f_X(x) \cdot {\rm log} \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x \hspace{0.5cm}+ \hspace{0.5cm}
 
:$$L= \hspace{0.1cm}  \hspace{0.05cm} \int_{-A}^{+A} \hspace{0.05cm}  f_X(x) \cdot {\rm log} \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x \hspace{0.5cm}+ \hspace{0.5cm}
 
\lambda \cdot
 
\lambda \cdot
 
\int_{-A}^{+A} \hspace{0.05cm}  f_X(x)  \hspace{0.1cm}{\rm d}x   
 
\int_{-A}^{+A} \hspace{0.05cm}  f_X(x)  \hspace{0.1cm}{\rm d}x   
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
*Wir setzen allgemein $f_X(x) = g_X(x) + ε · ε_X(x)$, wobei $ε_X(x)$ eine beliebige Funktion darstellt, mit der Einschränkung, dass die WDF–Fläche gleich $1$ sein muss. Damit erhalten wir:
+
*We generally set&nbsp; $f_X(x) = g_X(x) + ε · ε_X(x)$, where&nbsp; $ε_X(x)$&nbsp; is an arbitrary function,&nbsp; with the restriction that the PDF area must equal&nbsp; $1$.&nbsp; Thus we obtain:
:$$\begin{align*}L = \hspace{0.1cm}  \hspace{0.05cm} \int_{-A}^{+A} \hspace{0.05cm} [ g_X(x) + \varepsilon \cdot \varepsilon_X(x) ] \cdot {\rm log} \hspace{0.1cm} \frac{1}{ g_X(x) + \varepsilon \cdot \varepsilon_X(x) } \hspace{0.1cm}{\rm d}x + \lambda \cdot
+
:$$\begin{align*}L = \hspace{0.1cm}  \hspace{0.05cm} \int_{-A}^{+A} \hspace{0.05cm}\big [ g_X(x) + \varepsilon \cdot \varepsilon_X(x)\big ] \cdot {\rm log} \hspace{0.1cm} \frac{1}{ g_X(x) + \varepsilon \cdot \varepsilon_X(x) } \hspace{0.1cm}{\rm d}x + \lambda \cdot
 
\int_{-A}^{+A} \hspace{0.05cm} \big [ g_X(x) + \varepsilon \cdot \varepsilon_X(x) \big ]  \hspace{0.1cm}{\rm d}x   
 
\int_{-A}^{+A} \hspace{0.05cm} \big [ g_X(x) + \varepsilon \cdot \varepsilon_X(x) \big ]  \hspace{0.1cm}{\rm d}x   
 
\hspace{0.05cm}.\end{align*}$$
 
\hspace{0.05cm}.\end{align*}$$
*Die bestmögliche Funktion ergibt sich dann, wenn es für $ε = 0$ eine stationäre Lösung gibt:
+
*The best possible function is obtained when there is a stationary solution for&nbsp; $ε = 0$&nbsp;:
 
:$$\left [\frac{{\rm d}L}{{\rm d}\varepsilon} \right ]_{\varepsilon \hspace{0.05cm}= \hspace{0.05cm}0}=\hspace{0.1cm}  \hspace{0.05cm} \int_{-A}^{+A} \hspace{0.05cm}  \varepsilon_X(x)  \cdot \big [ {\rm log} \hspace{0.1cm} \frac{1}{ g_X(x) } -1 \big ]\hspace{0.1cm}{\rm d}x \hspace{0.3cm} + \hspace{0.3cm}\lambda \cdot
 
:$$\left [\frac{{\rm d}L}{{\rm d}\varepsilon} \right ]_{\varepsilon \hspace{0.05cm}= \hspace{0.05cm}0}=\hspace{0.1cm}  \hspace{0.05cm} \int_{-A}^{+A} \hspace{0.05cm}  \varepsilon_X(x)  \cdot \big [ {\rm log} \hspace{0.1cm} \frac{1}{ g_X(x) } -1 \big ]\hspace{0.1cm}{\rm d}x \hspace{0.3cm} + \hspace{0.3cm}\lambda \cdot
 
\int_{-A}^{+A} \hspace{0.05cm}  \varepsilon_X(x)  \hspace{0.1cm}{\rm d}x \stackrel{!}{=} 0  
 
\int_{-A}^{+A} \hspace{0.05cm}  \varepsilon_X(x)  \hspace{0.1cm}{\rm d}x \stackrel{!}{=} 0  
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
*Diese Bedingungsgleichung ist unabhängig von $ε_X$ nur dann zu erfüllen, wenn gilt:
+
*This conditional equation can be satisfied independently of&nbsp; $ε_X$&nbsp; only if holds:
 
:$${\rm log} \hspace{0.1cm} \frac{1}{ g_X(x) } -1 + \lambda  = 0 \hspace{0.4cm}
 
:$${\rm log} \hspace{0.1cm} \frac{1}{ g_X(x) } -1 + \lambda  = 0 \hspace{0.4cm}
\forall x \in [-A, +A]\hspace{0.3cm} \Rightarrow\hspace{0.3cm}
+
\forall x \in \big[-A, +A \big]\hspace{0.3cm} \Rightarrow\hspace{0.3cm}
 
  g_X(x)  = {\rm const.}\hspace{0.4cm}
 
  g_X(x)  = {\rm const.}\hspace{0.4cm}
\forall x \in [-A, +A]\hspace{0.05cm}.$$
+
\forall x \in \big [-A, +A \big]\hspace{0.05cm}.$$
  
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
$\text{Resümee bei Spitzenwertbegrenzung:}$&nbsp;  
+
$\text{Summary for peak constraints:}$&nbsp;  
  
Die maximale differentielle Entropie ergibt sich unter der Nebenbedingung $ \vert X \vert ≤ A$ für die '''Gleichverteilung''' (englisch: ''Uniform PDF''):
+
The maximum differential entropy is obtained under the constraint&nbsp; $ \vert X \vert ≤ A$&nbsp; for the&nbsp; &raquo;'''uniform PDF'''&laquo;:
 
:$$h_{\rm max}(X) = {\rm log} \hspace{0.1cm} ({\it \Gamma}_{\rm A} \cdot A) = {\rm log} \hspace{0.1cm} (2A) \hspace{0.5cm} \Rightarrow\hspace{0.5cm} {\it \Gamma}_{\rm A} = 2
 
:$$h_{\rm max}(X) = {\rm log} \hspace{0.1cm} ({\it \Gamma}_{\rm A} \cdot A) = {\rm log} \hspace{0.1cm} (2A) \hspace{0.5cm} \Rightarrow\hspace{0.5cm} {\it \Gamma}_{\rm A} = 2
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
Jede andere Zufallsgröße mit der WDF–Eigenschaft &nbsp;$f_X(\vert x \vert  > A) = 0$&nbsp; führt zu einer kleineren differentiellen Entropie, gekennzeichnet durch den Parameter &nbsp;${\it \Gamma}_{\rm A} < 2$.}}
+
Any other random variable with the PDF property &nbsp;$f_X(\vert x \vert  > A) = 0$ &nbsp; leads to a smaller differential entropy, characterized by the parameter &nbsp;${\it \Gamma}_{\rm A} < 2$.}}
  
==Beweis: Maximale differentielle Entropie bei Leistungsbegrenzung==
+
==Proof: Maximum differential entropy with power constraint==
 
<br>
 
<br>
Vorneweg zur Begriffserklärung:  
+
Let's start by explaining the term:
*Eigentlich wird nicht die Leistung  &nbsp; &nbsp;   das [[Stochastische_Signaltheorie/Erwartungswerte_und_Momente|zweite Moment]] $m_2$ begrenzt, sondern das [[Stochastische_Signaltheorie/Erwartungswerte_und_Momente#Zentralmomente|zweite Zentralmoment]] &nbsp;  ⇒ &nbsp; Varianz $μ_2 = σ^2$.  
+
*Actually,&nbsp; it is not the power &nbsp; ⇒ the&nbsp; [[Theory_of_Stochastic_Signals/Expected_Values_and_Moments#Moment_calculation_as_ensemble_average|"second moment"]]&nbsp; $m_2$ that  is limited,&nbsp; but the&nbsp; [[Theory_of_Stochastic_Signals/Expected_Values_and_Moments#Some_common_central_moments|"second central moment"]]&nbsp;  ⇒ &nbsp; variance&nbsp; $μ_2 = σ^2$.
*Gesucht wird also nun die maximale differentielle Entropie unter der Nebenbedingung ${\rm E}\big[|X m_1|^2 \big] ≤ σ^2$.  
+
 
*Das $≤$&ndash;Zeichen dürfen wir hierbei durch das Gleichheitszeichen ersetzen.
+
*We are now looking for the maximum differential entropy under the constraint&nbsp; ${\rm E}\big[|X - m_1|^2 \big] ≤ σ^2$.  
 +
*Here we may replace the&nbsp; "smaller/equal sign"&nbsp; by the&nbsp; "equal sign".  
  
  
Lassen wir nur mittelwertfreie Zufallsgrößen zu, so umgehen wir das Problem. Damit lautet die [https://de.wikipedia.org/wiki/Lagrange-Multiplikator Lagrange-Multiplikator]:
+
If we only allow mean-free random variables, we circumvent the problem.&nbsp; Thus the&nbsp; [https://en.wikipedia.org/wiki/Lagrange_multiplier "Lagrange multiplier"]:
 
   
 
   
 
:$$L= \hspace{0.1cm}  \hspace{0.05cm} \int_{-\infty}^{+\infty} \hspace{-0.1cm}  f_X(x) \cdot {\rm log} \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x \hspace{0.1cm}+ \hspace{0.1cm}
 
:$$L= \hspace{0.1cm}  \hspace{0.05cm} \int_{-\infty}^{+\infty} \hspace{-0.1cm}  f_X(x) \cdot {\rm log} \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x \hspace{0.1cm}+ \hspace{0.1cm}
Line 399: Line 439:
 
\int_{-\infty}^{+\infty}\hspace{-0.1cm}  x^2 \cdot f_X(x)  \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.$$
 
\int_{-\infty}^{+\infty}\hspace{-0.1cm}  x^2 \cdot f_X(x)  \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.$$
  
Nach ähnlichem Vorgehen wie beim [[Informationstheorie/Differentielle_Entropie#Beweis:_Maximale_differentielle_Entropie_bei_Spitzenwertbegrenzung|Beweis für Spitzenwertbegrenzung]] erhält man das Ergebnis, dass die „bestmögliche” WDF $g_X(x)$ proportinonal zu ${\rm e}^{–λ_2 · x^2}$ sein muss  &nbsp;  ⇒  &nbsp;  [[Stochastische_Signaltheorie/Gaußverteilte_Zufallsgröße|Gaußverteilung]]:
+
Following a similar procedure as in the&nbsp; [[Information_Theory/Differentielle_Entropie#Proof:_Maximum_differential_entropy_with_peak_constraint|"proof of the peak constraint"]]&nbsp; it turns out, that the "best possible" function must be  &nbsp; $g_X(x) \sim {\rm e}^{–λ_2\hspace{0.05cm} · \hspace{0.05cm} x^2}$&nbsp; &nbsp;  ⇒  &nbsp;  [[Theory_of_Stochastic_Signals/Gaussian_Distributed_Random_Variables|"Gaussian distribution"]]:
 
   
 
   
 
:$$g_X(x) ={1}/{\sqrt{2\pi  \sigma^2}} \cdot {\rm e}^{  
 
:$$g_X(x) ={1}/{\sqrt{2\pi  \sigma^2}} \cdot {\rm e}^{  
 
- \hspace{0.05cm}{x^2}/{(2 \sigma^2)} }\hspace{0.05cm}.$$
 
- \hspace{0.05cm}{x^2}/{(2 \sigma^2)} }\hspace{0.05cm}.$$
  
Wir verwenden hier aber für den expliziten Beweis die [[Informationstheorie/Einige_Vorbemerkungen_zu_zweidimensionalen_Zufallsgrößen#Relative_Entropie_.E2.80.93_Kullback.E2.80.93Leibler.E2.80.93Distanz|Kullback–Leibler–Distanz]] zwischen einer geeigneten allgemeinen WDF $f_X(x)$ und der Gauß–WDF $g_X(x)$:
+
However, we use here for the explicit proof the&nbsp; [[Information_Theory/Some_Preliminary_Remarks_on_Two-Dimensional_Random_Variables#Informational_divergence_-_Kullback-Leibler_distance|"Kullback–Leibler distance"]]&nbsp; between a suitable general PDF&nbsp; $f_X(x)$&nbsp; and the Gaussian PDF&nbsp; $g_X(x)$:
 
    
 
    
 
:$$D(f_X \hspace{0.05cm} ||  \hspace{0.05cm}g_X) = \int_{-\infty}^{+\infty} \hspace{0.02cm}
 
:$$D(f_X \hspace{0.05cm} ||  \hspace{0.05cm}g_X) = \int_{-\infty}^{+\infty} \hspace{0.02cm}
Line 411: Line 451:
 
  f_X(x) \cdot {\rm ln} \hspace{0.1cm} {g_X(x)} \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.$$
 
  f_X(x) \cdot {\rm ln} \hspace{0.1cm} {g_X(x)} \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.$$
  
Zur Vereinfachung wurde hier der natürliche Logarithmus &nbsp; &rArr; &nbsp; $\ln$ verwendet. Damit erhalten wir für das zweite Integral:
+
For simplicity, the natural logarithm &nbsp; &rArr; &nbsp; $\ln$&nbsp; is used here.&nbsp; Thus we obtain for the second integral:
 
   
 
   
 
:$$I_2 = - \frac{1}{2} \cdot {\rm ln} \hspace{0.1cm} (2\pi\sigma^2)  \cdot \hspace{-0.1cm}\int_{-\infty}^{+\infty} \hspace{-0.4cm}  f_X(x) \hspace{0.1cm}{\rm d}x
 
:$$I_2 = - \frac{1}{2} \cdot {\rm ln} \hspace{0.1cm} (2\pi\sigma^2)  \cdot \hspace{-0.1cm}\int_{-\infty}^{+\infty} \hspace{-0.4cm}  f_X(x) \hspace{0.1cm}{\rm d}x
Line 418: Line 458:
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
Das erste Integral ist definitionsgemäß gleich $1$ und das zweite Integral ergibt $σ^2$:
+
By definition, the first integral is equal to&nbsp; $1$&nbsp; and the second integral gives&nbsp; $σ^2$:
 
    
 
    
 
:$$I_2 = - {1}/{2} \cdot {\rm ln} \hspace{0.1cm} (2\pi\sigma^2)  - {1}/{2} \cdot [{\rm ln} \hspace{0.1cm} ({\rm e})] = - {1}/{2} \cdot {\rm ln} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2)$$
 
:$$I_2 = - {1}/{2} \cdot {\rm ln} \hspace{0.1cm} (2\pi\sigma^2)  - {1}/{2} \cdot [{\rm ln} \hspace{0.1cm} ({\rm e})] = - {1}/{2} \cdot {\rm ln} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2)$$
Line 424: Line 464:
 
-h(X) + {1}/{2} \cdot {\rm ln} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2)\hspace{0.05cm}.$$
 
-h(X) + {1}/{2} \cdot {\rm ln} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2)\hspace{0.05cm}.$$
  
Da auch bei wertkontinuierlichen Zufallsgrößen die Kullback–Leibler–Distanz stets $\ge 0$ ist, erhält man nach Verallgemeinerung (&bdquo;ln&rdquo; &nbsp; ⇒ &nbsp;  &bdquo;log&rdquo;):
+
Since also for continuous  random variables the Kullback-Leibler distance is always&nbsp; $\ge 0$&nbsp;, after generalization ("ln" &nbsp; ⇒ &nbsp;  "log"):
 
   
 
   
 
:$$h(X) \le {1}/{2} \cdot {\rm log} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2)\hspace{0.05cm}.$$
 
:$$h(X) \le {1}/{2} \cdot {\rm log} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2)\hspace{0.05cm}.$$
  
Das Gleichzeichen gilt nur, wenn die Zufallsgröße $X$ gaußverteilt ist.
+
The equal sign only applies if the random variable&nbsp; $X$&nbsp; is Gaussian distributed.
  
 
{{BlaueBox|TEXT=
 
{{BlaueBox|TEXT=
$\text{Resümee bei Leistungsbegrenzung:}$&nbsp;  
+
$\text{Summary for power constraints:}$&nbsp;  
  
Die maximale differentielle Entropie ergibt sich unter der Nebenbedingung ${\rm E}\big[ \vert X – m_1 \vert ^2 \big] ≤ σ^2$  unabhängig vom Mittelwert $m_1$ für die '''Gaußverteilung''' (englisch: ''Gaussian PDF''):
+
The maximum differential entropy is obtained under the condition&nbsp; ${\rm E}\big[ \vert X – m_1 \vert ^2 \big] ≤ σ^2$&nbsp; independent of&nbsp; $m_1$&nbsp; for the&nbsp; &raquo;'''Gaussian PDF'''&laquo;:
 
:$$h_{\rm max}(X) = {1}/{2} \cdot {\rm log} \hspace{0.1cm} ({\it \Gamma}_{\hspace{-0.01cm} \rm L} \cdot \sigma^2) =  
 
:$$h_{\rm max}(X) = {1}/{2} \cdot {\rm log} \hspace{0.1cm} ({\it \Gamma}_{\hspace{-0.01cm} \rm L} \cdot \sigma^2) =  
 
  {1}/{2} \cdot {\rm log} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2) \hspace{0.5cm} \Rightarrow\hspace{0.5cm} {\it \Gamma}_{\rm L} = 2\pi{\rm e}
 
  {1}/{2} \cdot {\rm log} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2) \hspace{0.5cm} \Rightarrow\hspace{0.5cm} {\it \Gamma}_{\rm L} = 2\pi{\rm e}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
Jede andere wertkontinuierliche Zufallsgröße $X$ mit Varianz ${\rm E}\big[ \vert X – m_1 \vert ^2 \big] ≤ σ^2$ führt zu einer kleineren differentiellen Entropie, gekennzeichnet durch die Kenngröße ${\it \Gamma}_{\rm L}  < 2πe$. }}
+
Any other  continuous random variable&nbsp; $X$&nbsp; with variance&nbsp; ${\rm E}\big[ \vert X – m_1 \vert ^2 \big] ≤ σ^2$&nbsp; leads to a smaller value,&nbsp; characterized by the parameter ${\it \Gamma}_{\rm L}  < 2πe$. }}
  
  
==Aufgaben zum Kapitel==
+
==Exercises for the chapter==
 
<br>
 
<br>
[[Aufgaben:4.1 WDF, VTF und Wahrscheinlichkeit|Aufgabe 4.1: WDF, VTF und Wahrscheinlichkeit]]
+
[[Aufgaben:Exercise_4.1:_PDF,_CDF_and_Probability|Exercise 4.1: PDF, CDF and Probability]]
  
[[Aufgaben:4.1Z Momentenberechnung|Aufgabe 4.1Z: Momentenberechnung]]
+
[[Aufgaben:Exercise_4.1Z:_Calculation_of_Moments|Exercise 4.1Z: Calculation of Moments]]
  
[[Aufgaben:4.2 Dreieckförmige WDF|Aufgabe 4.2: Dreieckförmige WDF]]
+
[[Aufgaben:Exercise_4.2:_Triangular_PDF|Exercise 4.2: Triangular PDF]]
  
[[Aufgaben:4.2Z Gemischte Zufallsgrößen|Aufgabe 4.2Z: Gemischte Zufallsgrößen]]
+
[[Aufgaben:Exercise_4.2Z:_Mixed_Random_Variables|Exercise 4.2Z: Mixed Random Variables]]
  
[[Aufgaben:Aufgabe_4.3:_WDF–Vergleich_bezüglich_differentieller_Entropie|Aufgabe 4.3: WDF–Vergleich bezüglich  differentieller Entropie]]
+
[[Aufgaben:Exercise_4.3:_PDF_Comparison_with_Regard_to_Differential_Entropy|Exercise 4.3: PDF Comparison with Regard to Differential Entropy]]
  
[[Aufgaben:4.3Z Exponential– und Laplaceverteilung|Aufgabe 4.3Z: Exponential– und Laplaceverteilung]]
+
[[Aufgaben:Exercise_4.3Z:_Exponential_and_Laplace_Distribution|Exercise 4.3Z: Exponential and Laplace Distribution]]
  
[[Aufgaben:4.4 Herkömmliche Entropie und differenzielle Entropie|Aufgabe 4.4: Herkömmliche Entropie und differenzielle Entropie]]
+
[[Aufgaben:Exercise_4.4:_Conventional_Entropy_and_Differential_Entropy|Exercise 4.4: Conventional Entropy and Differential Entropy]]
  
  

Latest revision as of 15:29, 28 February 2023

# OVERVIEW OF THE FOURTH MAIN CHAPTER #


In the last chapter of this book,  the information-theoretical quantities defined so far for the discrete case are adapted in such a way that they can also be applied to continuous random quantities.

  • For example,  the entropy  $H(X)$  for the discrete random variable  $X$  becomes the  »differential entropy«  $h(X)$  in the continuous case.
  • While  $H(X)$  indicates the  »uncertainty«  with regard to the discrete random variable  $X$;  in the continuous case  $h(X)$  cannot be interpreted in the same way.


Many of the relationships derived in the third chapter  »Information between two discrete random variables«  for conventional entropy also apply to differential entropy.   Thus,  the differential joint entropy  $h(XY)$  can also be given for continuous random variables  $X$  and  $Y$,  and likewise also the two conditional differential entropies  $h(Y|X)$  and  $h(X|Y)$.


In detail, this main chapter deals with

  1. the special features of  »continuous random variables«,
  2. the  »definition and calculation of the differential entropy«  as well as its properties,
  3. the  »mutual information«  between two continuous random variables,
  4. the  »capacity of the AWGN channel«  and several such parallel Gaussian channels,
  5. the  »channel coding theorem«,  one of the highlights of Shannon's information theory,
  6. the  »AWGN channel capacity«  for discrete input  $($BPSK,  QPSK$)$.



Properties of continuous random variables


Up to now,  "discrete random variables"  of the form  $X = \{x_1,\ x_2, \hspace{0.05cm}\text{...}\hspace{0.05cm} , x_μ, \text{...} ,\ x_M\}$  have always been considered, which from an information-theoretical point of view are completely characterized by their  "probability mass function"  $\rm (PMF)$:

$$P_X(X) = \big [ \hspace{0.1cm} p_1, p_2, \hspace{0.05cm}\text{...} \hspace{0.15cm}, p_{\mu},\hspace{0.05cm} \text{...}\hspace{0.15cm}, p_M \hspace{0.1cm}\big ] \hspace{0.3cm}{\rm with} \hspace{0.3cm} p_{\mu}= P_X(x_{\mu})= {\rm Pr}( X = x_{\mu}) \hspace{0.05cm}.$$

A  "continuous random variable",  on the other hand, can assume any value – at least in finite intervals:

  • Due to the uncountable supply of values, the description by a probability mass function is not possible in this case, or at least it does not make sense:
  • This would result in the symbol set size  $M \to ∞$  as well as probabilities  $p_1 \to 0$,  $p_2 \to 0$,  etc.


For the description of continuous random variables, one uses equally according to the definitions in the book  "Theory of Stochastic Signals":

PDF and CDF of a continuous random variable
$$f_X(x_0)= \lim_{{\rm \Delta} x\to \rm 0}\frac{p_{{\rm \Delta} x}}{{\rm \Delta} x} = \lim_{{\rm \Delta} x\to \rm 0}\frac{{\rm Pr} \{ x_0- {\rm \Delta} x/\rm 2 \le \it X \le x_{\rm 0} +{\rm \Delta} x/\rm 2\}}{{\rm \Delta} x};$$
In words:   the PDF value at  $x_0$  gives the probability  $p_{Δx}$  that  $X$  lies in an (infinitely small) interval of width  $Δx$  around  $x_0$ , divided by  $Δx$   (note the entries in the adjacent graph);
$$m_1 = {\rm E}\big[ X \big]= \int_{-\infty}^{+\infty} \hspace{-0.1cm} x \cdot f_X(x) \hspace{0.1cm}{\rm d}x \hspace{0.05cm};$$
$$\sigma^2 = {\rm E}\big[(X- m_1 )^2 \big]= \int_{-\infty}^{+\infty} \hspace{-0.1cm} (x- m_1 )^2 \cdot f_X(x- m_1 ) \hspace{0.1cm}{\rm d}x \hspace{0.05cm};$$
$$F_X(x) = \int_{-\infty}^{x} \hspace{-0.1cm}f_X(\xi) \hspace{0.1cm}{\rm d}\xi \hspace{0.2cm} = \hspace{0.2cm} {\rm Pr}(X \le x)\hspace{0.05cm}.$$

Note that both the PDF area and the CDF final value are always equal to  $1$.

$\text{Nomenclature notes on PDF and CDF:}$

We use in this chapter for a  »probability density function«  $\rm (PDF)$  the representation form  $f_X(x)$  often used in the literature, where holds:

  • $X$  denotes the (discrete or continuous) random variable,
  • $x$  is a possible realization of  $X$   ⇒   $x ∈ X$.


Accordingly, we denote the  »cumulative distribution function«  $\rm (CDF)$  of the random variable  $X$  by  $F_X(x)$  according to the following definition:

$$F_X(x) = \int_{-\infty}^{x} \hspace{-0.1cm}f_X(\xi) \hspace{0.1cm}{\rm d}\xi \hspace{0.2cm} = \hspace{0.2cm} {\rm Pr}(X \le x)\hspace{0.05cm}.$$

In other  $\rm LNTwww$ books, we often write so as not to use up two characters for one variable:

  • For the PDF  $f_x(x)$   ⇒   no distinction between random variable and realization.
  • For the CDF  $F_x(r) = {\rm Pr}(x ≤ r)$   ⇒   here one needs a second variable in any case.


We apologize for this formal inaccuracy.


$\text{Example 1:}$  We now consider with the  »uniform distribution«  an important special case.

Two analog signals as examples of continuous random variables
  • The graph shows the course of two uniformly distributed variables, which can assume all values between  $1$  and  $5$  $($mean value $m_1 = 3)$  with equal probability.
  • On the left is the result of a random process, on the right a deterministic signal with the same amplitude distribution.
PDF and CDF of an uniformly distributed random variable


The  "probability density function"  $\rm (PDF)$  of the uniform distribution has the course sketched in the second graph above:

$$f_X(x) = \left\{ \begin{array}{c} \hspace{0.25cm}(x_{\rm max} - x_{\rm min})^{-1} \\ 1/2 \cdot (x_{\rm max} - x_{\rm min})^{-1} \\ \hspace{0.25cm} 0 \\ \end{array} \right. \begin{array}{*{20}c} {\rm{for} } \\ {\rm{for} } \\ {\rm{for} } \\ \end{array} \begin{array}{*{20}l} {x_{\rm min} < x < x_{\rm max},} \\ x ={x_{\rm min} \hspace{0.15cm}{\rm and}\hspace{0.15cm}x = x_{\rm max},} \\ x > x_{\rm max}. \\ \end{array}$$

The following equations are obtained here for the mean  $m_1 ={\rm E}\big[X\big]$  and the variance  $σ^2={\rm E}\big[(X – m_1)^2\big]$  :

$$m_1 = \frac{x_{\rm max} + x_{\rm min} }{2}\hspace{0.05cm}, $$
$$\sigma^2 = \frac{(x_{\rm max} - x_{\rm min})^2}{12}\hspace{0.05cm}.$$

Shown below is the   »cumulative distribution function«  $\rm (CDF)$:

$$F_X(x) = \int_{-\infty}^{x} \hspace{-0.1cm}f_X(\xi) \hspace{0.1cm}{\rm d}\xi \hspace{0.2cm} = \hspace{0.2cm} {\rm Pr}(X \le x)\hspace{0.05cm}.$$
  • This is identically zero for  $x ≤ x_{\rm min}$, increases linearly thereafter and reaches the CDF final value of   $1$ at  $x = x_{\rm max}$ .
  • The probability that the random variable  $X$  takes on a value between  $3$  and  $4$  can be determined from both the PDF and the CDF:
$${\rm Pr}(3 \le X \le 4) = \int_{3}^{4} \hspace{-0.1cm}f_X(\xi) \hspace{0.1cm}{\rm d}\xi = 0.25\hspace{0.05cm}\hspace{0.05cm},$$
$${\rm Pr}(3 \le X \le 4) = F_X(4) - F_X(3) = 0.25\hspace{0.05cm}.$$

Furthermore, note:

  • The result  $X = 0$  is excluded for this random variable   ⇒   ${\rm Pr}(X = 0) = 0$.
  • The result  $X = 4$ , on the other hand, is quite possible.  Nevertheless,  ${\rm Pr}(X = 4) = 0 $  also applies here.

Entropy of continuous random variables after quantization


We now consider a continuous random variable  $X$  in the range  $0 \le x \le 1$.

  • We quantize this random variable  $X$,  in order to be able to further apply the previous entropy calculation.  We call the resulting discrete (quantized) quantity  $Z$.
  • Let the number of quantization steps be  $M$,  so that each quantization interval  $μ$  has the width  ${\it Δ} = 1/M$  in the present PDF.  We denote the interval centres by  $x_μ$.
  • The probability  $p_μ = {\rm Pr}(Z = z_μ)$  with respect to  $Z$  is equal to the probability that the random variable  $X$  has a value between  $x_μ - {\it Δ}/2$  and  $x_μ + {\it Δ}/2$.
  • First we set  $M = 2$  and then double this value in each iteration.  This makes the quantization increasingly finer.  In the  $n$th try,  then apply  $M = 2^n$  and  ${\it Δ} =2^{–n}$.


$\text{Example 2:}$  The graph shows the results of the first three trials for an asymmetrical triangular PDF  $($betweeen  $0$  and  $1)$:

Entropy determination of the triangular PDF after quantization
  • $n = 1 \ ⇒ \ M = 2 \ ⇒ \ {\it Δ} = 1/2\text{:}$     $H(Z) = 0.811\ \rm bit,$
  • $n = 2 \ ⇒ \ M = 4 \ ⇒ \ {\it Δ} = 1/4\text{:}$     $H(Z) = 1.749\ \rm bit,$
  • $n = 3 \ ⇒ \ M = 8 \ ⇒ \ {\it Δ} = 1/8\text{:}$     $H(Z) = 2.729\ \rm bit.$


Additionally, the following quantities can be taken from the graph, for example for  ${\it Δ} = 1/8$:

  • The interval centres are at
$$x_1 = 1/16,\ x_2 = 3/16,\text{ ...} \ ,\ x_8 = 15/16 $$
$$ ⇒ \ x_μ = {\it Δ} · (μ - 1/2).$$
  • The interval areas result in  
$$p_μ = {\it Δ} · f_X(x_μ) ⇒ p_8 = 1/8 · (7/8+1)/2 = 15/64.$$
  • Thus, we obtain for the  $\rm PMF$  of the quantized random variable $Z$:
$$P_Z(Z) = (1/64, \ 3/64, \ 5/64, \ 7/64, \ 9/64, \ 11/64, \ 13/64, \ 15/64).$$


$\text{Conclusion:}$  We interpret the results of this experiment as follows:

  1. The entropy  $H(Z)$  becomes larger and larger as  $M$  increases.
  2. The limit of  $H(Z)$  for  $M \to ∞ \ ⇒ \ {\it Δ} → 0$  is infinite.
  3. Thus, the entropy  $H(X)$  of the continuous random variable  $X$  is also infinite.
  4. It follows:   The previous definition of entropy fails for continuous random variables.


To verify our empirical result, we assume the following equation:

$$H(Z) = \hspace{0.2cm} \sum_{\mu = 1}^{M} \hspace{0.2cm} p_{\mu} \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{p_{\mu}}= \hspace{0.2cm} \sum_{\mu = 1}^{M} \hspace{0.2cm} {\it \Delta} \cdot f_X(x_{\mu} ) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{{\it \Delta} \cdot f_X(x_{\mu} )}\hspace{0.05cm}.$$
  • We now split  $H(Z) = S_1 + S_2$  into two summands:
$$\begin{align*}S_1 & = {\rm log}_2 \hspace{0.1cm} \frac{1}{\it \Delta} \cdot \hspace{0.2cm} \sum_{\mu = 1}^{M} \hspace{0.02cm} {\it \Delta} \cdot f_X(x_{\mu} ) \approx - {\rm log}_2 \hspace{0.1cm}{\it \Delta} \hspace{0.05cm},\\ S_2 & = \hspace{0.05cm} \sum_{\mu = 1}^{M} \hspace{0.2cm} f_X(x_{\mu} ) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{ f_X(x_{\mu} ) } \cdot {\it \Delta} \hspace{0.2cm}\approx \hspace{0.2cm} \int_{0}^{1} \hspace{0.05cm} f_X(x) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.\end{align*}$$
  • The approximation  $S_1 ≈ -\log_2 {\it Δ}$  applies exactly only in the borderline case  ${\it Δ} → 0$. 
  • The given approximation for  $S_2$  is also only valid for small  ${\it Δ} → {\rm d}x$,  so that one should replace the sum by the integral.


$\text{Generalization:}$  If one approximates the continuous random variable  $X$  with the PDF  $f_X(x)$  by a discrete random variable  $Z$  by performing a (fine) quantization with the interval width  ${\it Δ}$,  one obtains for the entropy of the random variable  $Z$:

$$H(Z) \approx - {\rm log}_2 \hspace{0.1cm}{\it \Delta} \hspace{0.2cm}+ \hspace{-0.35cm} \int\limits_{\text{supp}(f_X)} \hspace{-0.35cm} f_X(x) \cdot {\rm log}_2 \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x = - {\rm log}_2 \hspace{0.1cm}{\it \Delta} \hspace{0.2cm} + h(X) \hspace{0.5cm}\big [{\rm in \hspace{0.15cm}bit}\big ] \hspace{0.05cm}.$$

For the special case   ${\it Δ} = 1/M = 2^{-n}$,  the above equation can also be written as follows:

$$H(Z) = n + h(X) \hspace{0.5cm}\big [{\rm in \hspace{0.15cm}bit}\big ] \hspace{0.05cm}.$$
  • In the borderline case  ${\it Δ} → 0 \ ⇒ \ M → ∞ \ ⇒ \ n → ∞$,  the entropy of the continuous random variable is also infinite:   $H(X) → ∞$.
  • For each  $n$  the equation  $H(Z) = n$  is only an approximation,  where the differential entropy  $h(X)$  of the continuous quantity serves as a correction factor.


$\text{Example 3:}$  As in  $\text{Example 2}$,  we consider a asymmetrical triangular PDF   $($between  $0$  and  $1)$.  Its differential entropy, as calculated in  "Exercise 4.2"  results in 

Entropy of the asymmetrical triangular PDF after quantization
$$h(X) = \hspace{0.05cm}-0.279 \ \rm bit.$$
  • The table shows the entropy  $H(Z)$  of the quantity  $Z$  quantized with  $n$  bits.
  • Already fo  $n = 3$  one can see a good agreement between the approximation  (lower row)  and the exact calculation  (row 2).
  • For  $n = 10$,  the approximation will agree even better with the exact calculation  (which is extremely time-consuming.


Definition and properties of differential entropy


$\text{Generalization:}$  The  »differential entropy«  $h(X)$  of a continuous value random variable  $X$  with probability density function  $f_X(x)$  is:

$$h(X) = \hspace{0.1cm} - \hspace{-0.45cm} \int\limits_{\text{supp}(f_X)} \hspace{-0.35cm} f_X(x) \cdot {\rm log} \hspace{0.1cm} \big[ f_X(x) \big] \hspace{0.1cm}{\rm d}x \hspace{0.6cm}{\rm with}\hspace{0.6cm} {\rm supp}(f_X) = \{ x\text{:} \ f_X(x) > 0 \} \hspace{0.05cm}.$$

A pseudo-unit must be added in each case:

  • "nat" when using  "ln"   ⇒   natural logarithm,
  • "bit" when using  "log2"   ⇒   binary logarithm.


While the (conventional) entropy of a discrete random variable  $X$  is always  $H(X) ≥ 0$ , the differential entropy  $h(X)$  of a continuous random variable can also be negative.  From this it is already evident that  $h(X)$  in contrast to  $H(X)$  cannot be interpreted as "uncertainty".

PDF of an uniform distributed random variable

$\text{Example 4:}$  The upper graph shows the  $\rm PDF$  of a random variable  $X$,  which is uniform distributed between  $x_{\rm min}$  and  $x_{\rm max}$.

  • For its differential entropy one obtains in  "nat":
$$\begin{align*}h(X) & = - \hspace{-0.18cm}\int\limits_{x_{\rm min} }^{x_{\rm max} } \hspace{-0.28cm} \frac{1}{x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} } \cdot {\rm ln} \hspace{0.1cm}\big [ \frac{1}{x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} }\big ] \hspace{0.1cm}{\rm d}x \\ & = {\rm ln} \hspace{0.1cm} \big[ {x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} }\big ] \cdot \big [ \frac{1}{x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} } \big ]_{x_{\rm min} }^{x_{\rm max} }={\rm ln} \hspace{0.1cm} \big[ {x_{\rm max}\hspace{-0.05cm} - \hspace{-0.05cm}x_{\rm min} } \big]\hspace{0.05cm}.\end{align*} $$
  • The equation for the differential entropy in "bit" is:  
$$h(X) = \log_2 \big[x_{\rm max} – x_{ \rm min} \big].$$
$h(X)$  for different rectangular density functions   ⇒   uniform distributed random variables





The graph on the left shows the numerical evaluation of the above result by means of some examples.


$\text{Interpretation:}$  From the six sketches in the last example, important properties of the differential entropy  $h(X)$  can be read:

  • The differential entropy is not changed by a PDF shift   $($by  $k)$ :
$$h(X + k) = h(X) \hspace{0.2cm}\Rightarrow \hspace{0.2cm} \text{For example:} \ \ h_3(X) = h_4(X) = h_5(X) \hspace{0.05cm}.$$
  • $h(X)$  changes by compression/spreading of the PDF by the factor  $k ≠ 0$  as follows:
$$h( k\hspace{-0.05cm} \cdot \hspace{-0.05cm}X) = h(X) + {\rm log}_2 \hspace{0.05cm} \vert k \vert \hspace{0.2cm}\Rightarrow \hspace{0.2cm} \text{For example:} \ \ h_6(X) = h_5(AX) = h_5(X) + {\rm log}_2 \hspace{0.05cm} (A) = {\rm log}_2 \hspace{0.05cm} (2A) \hspace{0.05cm}.$$


Many of the equations derived in the chapter  "Different entropies of two-dimensional random variables"  for the discrete case also apply to continuous random variables.

From the following compilation one can see that often only the (large)  $H$  has to be replaced by a (small)  $h$  as well as the probability mass function  $\rm (PMF)$  by the corresponding probability density function  $\rm (PDF)$ .

  • »Conditional Differential Entropy«:
$$H(X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y) = {\rm E} \hspace{-0.1cm}\left [ {\rm log} \hspace{0.1cm}\frac{1}{P_{\hspace{0.03cm}X \mid \hspace{0.03cm} Y} (X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y)}\right ]=\hspace{-0.04cm} \sum_{(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm}(\hspace{-0.03cm}P_{XY}\hspace{-0.08cm})} \hspace{-0.8cm} P_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{1}{P_{\hspace{0.03cm}X \mid \hspace{0.03cm} Y} (x \hspace{-0.05cm}\mid \hspace{-0.05cm} y)} \hspace{0.05cm}$$
$$\Rightarrow \hspace{0.3cm}h(X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y) = {\rm E} \hspace{-0.1cm}\left [ {\rm log} \hspace{0.1cm}\frac{1}{f_{\hspace{0.03cm}X \mid \hspace{0.03cm} Y} (X \hspace{-0.05cm}\mid \hspace{-0.05cm} Y)}\right ]=\hspace{0.2cm} \int \hspace{-0.9cm} \int\limits_{\hspace{-0.04cm}(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.03cm}(\hspace{-0.03cm}f_{XY}\hspace{-0.08cm})} \hspace{-0.6cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{1}{f_{\hspace{0.03cm}X \mid \hspace{0.03cm} Y} (x \hspace{-0.05cm}\mid \hspace{-0.05cm} y)} \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y\hspace{0.05cm}.$$
  • »Joint Differential Entropy«:
$$H(XY) = {\rm E} \left [ {\rm log} \hspace{0.1cm} \frac{1}{P_{XY}(X, Y)}\right ] =\hspace{-0.04cm} \sum_{(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm}(\hspace{-0.03cm}P_{XY}\hspace{-0.08cm})} \hspace{-0.8cm} P_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{1}{ P_{XY}(x, y)} \hspace{0.05cm}$$
$$\Rightarrow \hspace{0.3cm}h(XY) = {\rm E} \left [ {\rm log} \hspace{0.1cm} \frac{1}{f_{XY}(X, Y)}\right ] =\hspace{0.2cm} \int \hspace{-0.9cm} \int\limits_{\hspace{-0.04cm}(x, y) \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm}(\hspace{-0.03cm}f_{XY}\hspace{-0.08cm})} \hspace{-0.6cm} f_{XY}(x, y) \cdot {\rm log} \hspace{0.1cm} \frac{1}{ f_{XY}(x, y) } \hspace{0.15cm}{\rm d}x\hspace{0.15cm}{\rm d}y\hspace{0.05cm}.$$
  • »Chain rule«  of differential entropy:
$$H(X_1\hspace{0.05cm}X_2\hspace{0.05cm}\text{...} \hspace{0.1cm}X_n) =\sum_{i = 1}^{n} H(X_i | X_1\hspace{0.05cm}X_2\hspace{0.05cm}\text{...} \hspace{0.1cm}X_{i-1}) \le \sum_{i = 1}^{n} H(X_i) \hspace{0.05cm}$$
$$\Rightarrow \hspace{0.3cm} h(X_1\hspace{0.05cm}X_2\hspace{0.05cm}\text{...} \hspace{0.1cm}X_n) =\sum_{i = 1}^{n} h(X_i | X_1\hspace{0.05cm}X_2\hspace{0.05cm}\text{...} \hspace{0.1cm}X_{i-1}) \le \sum_{i = 1}^{n} h(X_i) \hspace{0.05cm}.$$
  • »Kullback–Leibler distance«  between the random variables  $X$  and  $Y$:
$$D(P_X \hspace{0.05cm} || \hspace{0.05cm}P_Y) = {\rm E} \left [ {\rm log} \hspace{0.1cm} \frac{P_X(X)}{P_Y(X)}\right ] \hspace{0.2cm}=\hspace{0.2cm} \sum_{x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp} \hspace{0.03cm}(\hspace{-0.03cm}P_{X})\hspace{-0.8cm}} P_X(x) \cdot {\rm log} \hspace{0.1cm} \frac{P_X(x)}{P_Y(x)} \ge 0$$
$$\Rightarrow \hspace{0.3cm}D(f_X \hspace{0.05cm} || \hspace{0.05cm}f_Y) = {\rm E} \left [ {\rm log} \hspace{0.1cm} \frac{f_X(X)}{f_Y(X)}\right ] \hspace{0.2cm}= \hspace{-0.4cm}\int\limits_{x \hspace{0.1cm}\in \hspace{0.1cm}{\rm supp}\hspace{0.03cm}(\hspace{-0.03cm}f_{X}\hspace{-0.08cm})} \hspace{-0.4cm} f_X(x) \cdot {\rm log} \hspace{0.1cm} \frac{f_X(x)}{f_Y(x)} \hspace{0.15cm}{\rm d}x \ge 0 \hspace{0.05cm}.$$

Differential entropy of some peak-constrained random variables


Differential entropy of peak-constrained random variables

The table shows the results regarding the differential entropy for three exemplary probability density functions  $f_X(x)$.  These are all peak-constrained, i.e.   $|X| ≤ A$ applies in each case.

  • With  "peak constraint" , the differential entropy can always be represented as follows:
$$h(X) = {\rm log}\,\, ({\it \Gamma}_{\rm A} \cdot A).$$
  • Add the pseudo-unit  "nat"  when using  $\ln$  and the pseudo-unit  "bit"  when using  $\log_2$.
  • ${\it \Gamma}_{\rm A}$  depends solely on the PDF form and applies only to  "peak limitation"   ⇒   German:  "Amplitudenbegrenzung"   ⇒   Index  $\rm A$.
  • A uniform distribution in the range  $|X| ≤ 1$  yields  $h(X) = 1$  bit, a second one in the range  $|Y| ≤ 4$  to  $h(Y) = 3$  bit.


$\text{Theorem:}$  Under the   »peak contstraint«  ⇒   i.e. PDF  $f_X(x) = 0$  for  $ \vert x \vert > A$   –   the  »uniform distribution«  leads to the maximum differential entropy:

$$h_{\rm max}(X) = {\rm log} \hspace{0.1cm} (2A)\hspace{0.05cm}.$$

Here,  the appropriate parameter  ${\it \Gamma}_{\rm A} = 2$  is maximal. You will find the  $\text{proof}$  at the end of this chapter.


The theorem simultaneously means that for any other peak-constrained PDF  (except the uniform distribution)  the characteristic parameter  ${\it \Gamma}_{\rm A} < 2$.

  • For the symmetric triangular distribution, the above table gives  ${\it \Gamma}_{\rm A} = \sqrt{\rm e} ≈ 1.649$.
  • In contrast, for the one-sided triangle  $($between  $0$  and  $A)$    ${\it \Gamma}_{\rm A}$  is only half as large.
  • For every other triangle  $($width  $A$,  arbitrary peak between  $0$  and  $A)$    ${\it \Gamma}_{\rm A} ≈ 0.824$  also applies.


The respective second  $h(X)$ specification and the characteristic  ${\it \Gamma}_{\rm L}$  on the other hand, are suitable for the comparison of random variables with power constraints, which will be discussed in the next section.  Under this constraint, e.g. the symmetric triangular distribution  $({\it \Gamma}_{\rm L} ≈ 16.31)$  is better than the uniform distribution  $({\it \Gamma}_{\rm L} = 12)$.


Differential entropy of some power-constrained random variables


The differential entropies  $h(X)$  for three exemplary density functions  $f_X(x)$  without boundary, which all have the same variance  $σ^2 = {\rm E}\big[|X -m_x|^2 \big]$  and thus the same standard deviation  $σ$  through appropriate parameter selection, can be taken from the following table.  Considered are:

Differential entropy of power-constrained random variables


The differential entropy can always be represented here as

$$h(X) = 1/2 \cdot {\rm log} \hspace{0.1cm} ({\it \Gamma}_{\rm L} \cdot \sigma^2).$$

${\it \Gamma}_{\rm L}$  depends solely on the PDF form and applies only to  "power limitation"   ⇒   German:  "Leistungsbegrenzung"   ⇒   Index  $\rm L$.

The result differs only by the pseudo-unit

  • "nat" when using  $\ln$  or
  • "bit" when using  $\log_2$.


$\text{Theorem:}$  Under the constraint of  »power constraint«, the »Gaussian PDF«,

$$f_X(x) = \frac{1}{\sqrt{2\pi \sigma^2} } \cdot {\rm e}^{ - \hspace{0.05cm}{(x - m_1)^2}/(2 \sigma^2)},$$

leads to the maximum differential entropy,  independent of the mean  $m_1$:

$$h(X) = 1/2 \cdot {\rm log} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2)\hspace{0.3cm}\Rightarrow\hspace{0.3cm}{\it \Gamma}_{\rm L} < 2π{\rm e} ≈ 17.08\hspace{0.05cm}.$$

You will find the  "proof"  at the end of this chapter.


This statement means at the same time that for any PDF other than the Gaussian distribution, the characteristic value will be  ${\it \Gamma}_{\rm L} < 2π{\rm e} ≈ 17.08$.  For example, the characteristic value

  • for the triangular distribution to  ${\it \Gamma}_{\rm L} = 6{\rm e} ≈ 16.31$,
  • for the Laplace distribution to  ${\it \Gamma}_{\rm L} = 2{\rm e}^2 ≈ 14.78$, and
  • for the uniform distribution to  ${\it \Gamma}_{\rm L} = 12$ .

Proof: Maximum differential entropy with peak constraint


Under the peak constraint   ⇒   $|X| ≤ A$  the differential entropy is:

$$h(X) = \hspace{0.1cm} \hspace{0.05cm} \int_{-A}^{+A} \hspace{0.05cm} f_X(x) \cdot {\rm log} \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.$$

Of all possible probability density functions  $f_X(x)$ that satisfy the condition

$$\int_{-A}^{+A} \hspace{0.05cm} f_X(x) \hspace{0.1cm}{\rm d}x = 1$$

we are now looking for the function  $g_X(x)$  that leads to the maximum differential entropy  $h(X)$.

For derivation we use the  $»\text{Lagrange multiplier method}$«:

  • We define the Lagrangian parameter  $L$  in such a way that it contains both  $h(X)$  and the constraint  $|X| ≤ A$ :
$$L= \hspace{0.1cm} \hspace{0.05cm} \int_{-A}^{+A} \hspace{0.05cm} f_X(x) \cdot {\rm log} \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x \hspace{0.5cm}+ \hspace{0.5cm} \lambda \cdot \int_{-A}^{+A} \hspace{0.05cm} f_X(x) \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.$$
  • We generally set  $f_X(x) = g_X(x) + ε · ε_X(x)$, where  $ε_X(x)$  is an arbitrary function,  with the restriction that the PDF area must equal  $1$.  Thus we obtain:
$$\begin{align*}L = \hspace{0.1cm} \hspace{0.05cm} \int_{-A}^{+A} \hspace{0.05cm}\big [ g_X(x) + \varepsilon \cdot \varepsilon_X(x)\big ] \cdot {\rm log} \hspace{0.1cm} \frac{1}{ g_X(x) + \varepsilon \cdot \varepsilon_X(x) } \hspace{0.1cm}{\rm d}x + \lambda \cdot \int_{-A}^{+A} \hspace{0.05cm} \big [ g_X(x) + \varepsilon \cdot \varepsilon_X(x) \big ] \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.\end{align*}$$
  • The best possible function is obtained when there is a stationary solution for  $ε = 0$ :
$$\left [\frac{{\rm d}L}{{\rm d}\varepsilon} \right ]_{\varepsilon \hspace{0.05cm}= \hspace{0.05cm}0}=\hspace{0.1cm} \hspace{0.05cm} \int_{-A}^{+A} \hspace{0.05cm} \varepsilon_X(x) \cdot \big [ {\rm log} \hspace{0.1cm} \frac{1}{ g_X(x) } -1 \big ]\hspace{0.1cm}{\rm d}x \hspace{0.3cm} + \hspace{0.3cm}\lambda \cdot \int_{-A}^{+A} \hspace{0.05cm} \varepsilon_X(x) \hspace{0.1cm}{\rm d}x \stackrel{!}{=} 0 \hspace{0.05cm}.$$
  • This conditional equation can be satisfied independently of  $ε_X$  only if holds:
$${\rm log} \hspace{0.1cm} \frac{1}{ g_X(x) } -1 + \lambda = 0 \hspace{0.4cm} \forall x \in \big[-A, +A \big]\hspace{0.3cm} \Rightarrow\hspace{0.3cm} g_X(x) = {\rm const.}\hspace{0.4cm} \forall x \in \big [-A, +A \big]\hspace{0.05cm}.$$

$\text{Summary for peak constraints:}$ 

The maximum differential entropy is obtained under the constraint  $ \vert X \vert ≤ A$  for the  »uniform PDF«:

$$h_{\rm max}(X) = {\rm log} \hspace{0.1cm} ({\it \Gamma}_{\rm A} \cdot A) = {\rm log} \hspace{0.1cm} (2A) \hspace{0.5cm} \Rightarrow\hspace{0.5cm} {\it \Gamma}_{\rm A} = 2 \hspace{0.05cm}.$$

Any other random variable with the PDF property  $f_X(\vert x \vert > A) = 0$   leads to a smaller differential entropy, characterized by the parameter  ${\it \Gamma}_{\rm A} < 2$.

Proof: Maximum differential entropy with power constraint


Let's start by explaining the term:

  • We are now looking for the maximum differential entropy under the constraint  ${\rm E}\big[|X - m_1|^2 \big] ≤ σ^2$.
  • Here we may replace the  "smaller/equal sign"  by the  "equal sign".


If we only allow mean-free random variables, we circumvent the problem.  Thus the  "Lagrange multiplier":

$$L= \hspace{0.1cm} \hspace{0.05cm} \int_{-\infty}^{+\infty} \hspace{-0.1cm} f_X(x) \cdot {\rm log} \hspace{0.1cm} \frac{1}{ f_X(x) } \hspace{0.1cm}{\rm d}x \hspace{0.1cm}+ \hspace{0.1cm} \lambda_1 \cdot \int_{-\infty}^{+\infty} \hspace{-0.1cm} f_X(x) \hspace{0.1cm}{\rm d}x \hspace{0.1cm}+ \hspace{0.1cm} \lambda_2 \cdot \int_{-\infty}^{+\infty}\hspace{-0.1cm} x^2 \cdot f_X(x) \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.$$

Following a similar procedure as in the  "proof of the peak constraint"  it turns out, that the "best possible" function must be   $g_X(x) \sim {\rm e}^{–λ_2\hspace{0.05cm} · \hspace{0.05cm} x^2}$    ⇒   "Gaussian distribution":

$$g_X(x) ={1}/{\sqrt{2\pi \sigma^2}} \cdot {\rm e}^{ - \hspace{0.05cm}{x^2}/{(2 \sigma^2)} }\hspace{0.05cm}.$$

However, we use here for the explicit proof the  "Kullback–Leibler distance"  between a suitable general PDF  $f_X(x)$  and the Gaussian PDF  $g_X(x)$:

$$D(f_X \hspace{0.05cm} || \hspace{0.05cm}g_X) = \int_{-\infty}^{+\infty} \hspace{0.02cm} f_X(x) \cdot {\rm ln} \hspace{0.1cm} \frac{f_X(x)}{g_X(x)} \hspace{0.1cm}{\rm d}x = -h(X) - I_2\hspace{0.3cm} \Rightarrow\hspace{0.3cm}I_2 = \int_{-\infty}^{+\infty} \hspace{0.02cm} f_X(x) \cdot {\rm ln} \hspace{0.1cm} {g_X(x)} \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.$$

For simplicity, the natural logarithm   ⇒   $\ln$  is used here.  Thus we obtain for the second integral:

$$I_2 = - \frac{1}{2} \cdot {\rm ln} \hspace{0.1cm} (2\pi\sigma^2) \cdot \hspace{-0.1cm}\int_{-\infty}^{+\infty} \hspace{-0.4cm} f_X(x) \hspace{0.1cm}{\rm d}x \hspace{0.3cm}- \hspace{0.3cm} \frac{1}{2\sigma^2} \cdot \hspace{-0.1cm}\int_{-\infty}^{+\infty} \hspace{0.02cm} x^2 \cdot f_X(x) \hspace{0.1cm}{\rm d}x \hspace{0.05cm}.$$

By definition, the first integral is equal to  $1$  and the second integral gives  $σ^2$:

$$I_2 = - {1}/{2} \cdot {\rm ln} \hspace{0.1cm} (2\pi\sigma^2) - {1}/{2} \cdot [{\rm ln} \hspace{0.1cm} ({\rm e})] = - {1}/{2} \cdot {\rm ln} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2)$$
$$\Rightarrow\hspace{0.3cm} D(f_X \hspace{0.05cm} || \hspace{0.05cm}g_X) = -h(X) - I_2 = -h(X) + {1}/{2} \cdot {\rm ln} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2)\hspace{0.05cm}.$$

Since also for continuous random variables the Kullback-Leibler distance is always  $\ge 0$ , after generalization ("ln"   ⇒   "log"):

$$h(X) \le {1}/{2} \cdot {\rm log} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2)\hspace{0.05cm}.$$

The equal sign only applies if the random variable  $X$  is Gaussian distributed.

$\text{Summary for power constraints:}$ 

The maximum differential entropy is obtained under the condition  ${\rm E}\big[ \vert X – m_1 \vert ^2 \big] ≤ σ^2$  independent of  $m_1$  for the  »Gaussian PDF«:

$$h_{\rm max}(X) = {1}/{2} \cdot {\rm log} \hspace{0.1cm} ({\it \Gamma}_{\hspace{-0.01cm} \rm L} \cdot \sigma^2) = {1}/{2} \cdot {\rm log} \hspace{0.1cm} (2\pi{\rm e} \cdot \sigma^2) \hspace{0.5cm} \Rightarrow\hspace{0.5cm} {\it \Gamma}_{\rm L} = 2\pi{\rm e} \hspace{0.05cm}.$$

Any other continuous random variable  $X$  with variance  ${\rm E}\big[ \vert X – m_1 \vert ^2 \big] ≤ σ^2$  leads to a smaller value,  characterized by the parameter ${\it \Gamma}_{\rm L} < 2πe$.


Exercises for the chapter


Exercise 4.1: PDF, CDF and Probability

Exercise 4.1Z: Calculation of Moments

Exercise 4.2: Triangular PDF

Exercise 4.2Z: Mixed Random Variables

Exercise 4.3: PDF Comparison with Regard to Differential Entropy

Exercise 4.3Z: Exponential and Laplace Distribution

Exercise 4.4: Conventional Entropy and Differential Entropy