Difference between revisions of "Aufgaben:Exercise 1.1: Entropy of the Weather"

From LNTwww
m (Text replacement - "Category:Aufgaben zu Informationstheorie" to "Category:Information Theory: Exercises")
 
(15 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Gedächtnislose Nachrichtenquellen
+
{{quiz-Header|Buchseite=Information_Theory/Discrete_Memoryless_Sources
 
}}
 
}}
  
[[File:Inf_A_1_1_vers2.png|right|frame|Fünf verschiedene Binärquellen]]
+
[[File:EN_Inf_A_1_1_v2.png|right|frame|Five different binary sources]]
Eine Wetterstation fragt täglich verschiedene Regionen ab und bekommt als Antwort jeweils eine Meldung   $x$  zurück, nämlich
+
A weather station queries different regions every day and receives a message   $x$  back as a response in each case, namely
  
* $x =  \rm B$:   Das Wetter ist eher schlecht.
+
* $x =  \rm B$:   The weather is rather bad.
* $x =  \rm G$:   Das Wetter ist eher gut.
+
* $x =  \rm G$:   The weather is rather good.
  
  
Die Daten wurden über viele Jahre für verschiedene Gebiete in Dateien abgelegt, so dass die Entropien der  $\rm B/G$–Folgen ermittelt werden können:
+
The data were stored in files over many years for different regions, so that the entropies of the  $\rm B/G$–sequences can be determined:
 
:$$H =  p_{\rm B} \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{p_{\rm B}} + p_{\rm G} \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{p_{\rm G}}$$
 
:$$H =  p_{\rm B} \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{p_{\rm B}} + p_{\rm G} \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{p_{\rm G}}$$
  
mit dem <i>Logarithmus dualis</i>
+
with the&nbsp; base-2 logarithm
:$${\rm log}_2\hspace{0.1cm}p=\frac{{\rm lg}\hspace{0.1cm}p}{{\rm lg}\hspace{0.1cm}2}\hspace{0.3cm} \left ( =  {\rm ld}\hspace{0.1cm}p \right ) \hspace{0.05cm}.$$
+
:$${\rm log}_2\hspace{0.1cm}p=\frac{{\rm lg}\hspace{0.1cm}p}{{\rm lg}\hspace{0.1cm}2}.$$
&bdquo;lg&rdquo;&nbsp; kennzeichnet hierbei den Logarithmus zur Basis&nbsp; $10$.&nbsp; Zu erwähnen ist ferner, dass jeweils noch die Pseudoeinheit&nbsp; $\text{bit/Anfrage}$ &nbsp;anzufügen ist.
+
Here,&nbsp; "lg"&nbsp; denotes the logarithm to the base&nbsp; $10$.&nbsp; It should also be mentioned that the pseudo-unit&nbsp; $\text{bit/enquiry}$ &nbsp;must be added in each case.
  
Die Grafik zeigt diese binären Folgen jeweils für&nbsp; $60$&nbsp; Tage und folgende Regionen:
+
The graph shows these binary sequences for&nbsp; $60$&nbsp; days and the following regions:
  
* Region &bdquo;Durchwachsen&rdquo;: &nbsp;&nbsp; $p_{\rm B} = p_{\rm G} =0.5$,
+
* Region&nbsp; "Mixed": &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; $p_{\rm B} = p_{\rm G} =0.5$,
* Region &bdquo;Regenloch&rdquo;: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $p_{\rm B} = 0.8, \; p_{\rm G} =0.2$,  
+
* Region&nbsp; "Rainy": &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; $p_{\rm B} = 0.8, \; p_{\rm G} =0.2$,  
* Region &bdquo;Angenehm&rdquo;: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $p_{\rm B} = 0.2, \; p_{\rm G} =0.8$,  
+
* Region&nbsp; "Enjoyable": &nbsp;&nbsp;&nbsp; $p_{\rm B} = 0.2, \; p_{\rm G} =0.8$,  
* Region &bdquo;Paradies&rdquo;: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $p_{\rm B} = 1/30, \; p_{\rm G} =29/30$.
+
* Region&nbsp; "Paradise": &nbsp; &nbsp; &nbsp;&nbsp; $p_{\rm B} = 1/30, \; p_{\rm G} =29/30$.
  
  
Schließlich ist auch noch die Datei &bdquo;Unbekannt&rdquo; angegeben, deren statistische Eigenschaften zu schätzen sind.
+
Finally, the file&nbsp; "Unknown"&nbsp; is also given, whose statistical properties are to be estimated.
  
  
Line 34: Line 34:
  
  
''Hinweise:''  
+
''Hinss:''  
*Die Aufgabe gehört zum  Kapitel&nbsp; [[Information_Theory/Gedächtnislose_Nachrichtenquellen|Gedächtnislose Nachrichtenquellen]].
+
*This task belongs to the chapter&nbsp; [[Information_Theory/Gedächtnislose_Nachrichtenquellen|Discrete Memoryless Sources]].
 
   
 
   
*Für die vier ersten Dateien wird vorausgesetzt, dass die Ereignisse&nbsp; $\rm B$&nbsp; und&nbsp; $\rm G$&nbsp; statistisch unabhängig seien, eine für die Wetterpraxis eher unrealistische Annahme.
+
*For the first four files it is assumed that the events&nbsp; $\rm B$&nbsp; and&nbsp; $\rm G$&nbsp; are statistically independent, a rather unrealistic assumption for weather practice.
  
*Die Aufgabe wurde zu einer Zeit konzipiert, als&nbsp; [https://de.wikipedia.org/wiki/Greta_Thunberg Greta]&nbsp; gerade in die Schule kam.&nbsp; Wir überlassen es Ihnen,  &bdquo;Paradies&rdquo; in &bdquo;Hölle&rdquo; umzubenennen.  
+
*The task was designed at a time when&nbsp; [https://en.wikipedia.org/wiki/Greta_Thunberg Greta]&nbsp; was just starting school.&nbsp; We leave it to you to rename&nbsp; "Paradise"&nbsp; to&nbsp; "Hell".  
  
  
  
  
===Fragebogen===
+
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
{Welche Entropie&nbsp; $H_{\rm D}$&nbsp; weist die Datei&nbsp; &bdquo;Durchwachsen"&nbsp; auf?
+
{What is the entropy&nbsp; $H_{\rm M}$&nbsp; of the file&nbsp; "Mixed"?
 
|type="{}"}
 
|type="{}"}
$H_{\rm D}\ = \ $  { 1 3% } $\ \rm bit/Anfrage$
+
$H_{\rm M}\ = \ $  { 1 3% } $\ \rm bit/enquiry$
  
  
{Welche Entropie&nbsp; $H_{\rm R}$&nbsp; weist die Datei&nbsp; &bdquo;Regenloch&rdquo;&nbsp; auf?
+
{What is the entropy&nbsp; $H_{\rm R}$&nbsp; of the file&nbsp; "Rainy"?
 
|type="{}"}
 
|type="{}"}
$H_{\rm R}\ =  \ $ { 0.722 3% }  $\ \rm bit/Anfrage$
+
$H_{\rm R}\ =  \ $ { 0.722 3% }  $\ \rm bit/enquiry$
  
  
{Welche Entropie&nbsp; $H_{\rm A}$&nbsp; weist die Datei&nbsp; &bdquo;Angenehm&rdquo;&nbsp; auf?
+
{What is the entropy&nbsp; $H_{\rm E}$&nbsp; of the file&nbsp; "Enjoyable"?
 
|type="{}"}
 
|type="{}"}
$H_{\rm A}\ =  \ $ { 0.722 3% } $\ \rm bit/Anfrage$
+
$H_{\rm E}\ =  \ $ { 0.722 3% } $\ \rm bit/enquiry$
  
  
{Wie groß sind die Informationsgehalte der Ereignisse&nbsp; $\rm B$&nbsp; und&nbsp; $\rm G$&nbsp; bezogen auf die Datei&nbsp; &bdquo;Paradies&rdquo;?
+
{How large are the information contents of events&nbsp; $\rm B$&nbsp; and&nbsp; $\rm G$&nbsp; in relation to the file&nbsp; "Paradise"?
 
|type="{}"}
 
|type="{}"}
$I_{\rm B}\ =  \ $ { 4.907 3% } $\ \rm bit/Anfrage$
+
$I_{\rm B}\ =  \ $ { 4.907 3% } $\ \rm bit/enquiry$
$I_{\rm G}\ =  \ $ { 0.049 3% } $\ \rm bit/Anfrage$
+
$I_{\rm G}\ =  \ $ { 0.049 3% } $\ \rm bit/enquiry$
  
  
{Wie groß ist die Entropie&nbsp; (das heißt:&nbsp; der mittlere Informationsgehalt)&nbsp; $H_{\rm P}$&nbsp; der Datei&nbsp; &bdquo;Paradies&rdquo;?&nbsp; Interpretieren Sie das Ergebnis?
+
{What is the entropy&nbsp; (that is:&nbsp; the average information content)&nbsp; $H_{\rm P}$&nbsp; of the file&nbsp; "paradise"?&nbsp; Interpret the result.
 
|type="{}"}
 
|type="{}"}
$H_{\rm P}\ =  \ $ { 0.211 3% } $\ \rm bit/Anfrage$
+
$H_{\rm P}\ =  \ $ { 0.211 3% } $\ \rm bit/enquiry$
  
  
{Welche Aussagen könnten für die Datei&nbsp; &bdquo;Unbekannt&rdquo;&nbsp; gelten?
+
{Which statements could be true for the file&nbsp; "Unknown"?
 
|type="[]"}
 
|type="[]"}
+ Die Ereignisse&nbsp; $\rm B$&nbsp; und&nbsp; $\rm G$&nbsp; sind etwa gleichwahrscheinlich.
+
+ Events&nbsp; $\rm B$&nbsp; and&nbsp; $\rm G$&nbsp; are approximately equally probable.
- Die Folgenelemente sind statistisch voneinander unabhängig.
+
- The sequence elements are statistically independent of each other.
+ Die Entropie dieser Datei ist&nbsp; $H_\text{U} \approx 0.7 \; \rm bit/Anfrage$.
+
+ The entropy of this file is&nbsp; $H_\text{U} \approx 0.7 \; \rm bit/enquiry$.
- Die Entropie dieser Datei ist&nbsp; $H_\text{U} = 1.5 \; \rm bit/Anfrage$.
+
- The entropy of this file is&nbsp; $H_\text{U} = 1.5 \; \rm bit/enquiry$.
  
  
Line 84: Line 84:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
'''(1)'''&nbsp; Bei der Datei&nbsp; &bdquo;Durchwachsen&rdquo;&nbsp; sind die beiden Wahrscheinlichkeiten gleich: &nbsp;  $p_{\rm B} = p_{\rm G} =0.5$.&nbsp; Damit ergibt sich für die Entropie:
+
'''(1)'''&nbsp; For the file&nbsp; "Mixed"&nbsp; the two probabilities are the same: &nbsp;  $p_{\rm B} = p_{\rm G} =0.5$.&nbsp; This gives us for the entropy:
:$$H_{\rm D} =  0.5 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.5} + 0.5 \cdot  
+
:$$H_{\rm M} =  0.5 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.5} + 0.5 \cdot  
{\rm log}_2\hspace{0.1cm}\frac{1}{0.5} \hspace{0.15cm}\underline {= 1\,{\rm bit/Anfrage}}\hspace{0.05cm}.$$
+
{\rm log}_2\hspace{0.1cm}\frac{1}{0.5} \hspace{0.15cm}\underline {= 1\,{\rm bit/enquiry}}\hspace{0.05cm}.$$
  
  
'''(2)'''&nbsp; Mit&nbsp; $p_{\rm B} = 0.8$&nbsp; und&nbsp; $p_{\rm G} =0.2$&nbsp; erhält man einen kleineren Entropiewert:
+
'''(2)'''&nbsp; With&nbsp; $p_{\rm B} = 0.8$&nbsp; and&nbsp; $p_{\rm G} =0.2$,&nbsp; a smaller entropy value is obtained:
 
:$$H_{\rm R} \hspace{-0.05cm}= \hspace{-0.05cm}0.8 \cdot {\rm log}_2\hspace{0.05cm}\frac{5}{4} \hspace{-0.05cm}+ \hspace{-0.05cm}0.2 \cdot {\rm log}_2\hspace{0.05cm}\frac{5}{1}\hspace{-0.05cm}=\hspace{-0.05cm}   
 
:$$H_{\rm R} \hspace{-0.05cm}= \hspace{-0.05cm}0.8 \cdot {\rm log}_2\hspace{0.05cm}\frac{5}{4} \hspace{-0.05cm}+ \hspace{-0.05cm}0.2 \cdot {\rm log}_2\hspace{0.05cm}\frac{5}{1}\hspace{-0.05cm}=\hspace{-0.05cm}   
 
0.8 \cdot{\rm log}_2\hspace{0.05cm}5\hspace{-0.05cm} - \hspace{-0.05cm}0.8 \cdot {\rm log}_2\hspace{0.05cm}4 \hspace{-0.05cm}+ \hspace{-0.05cm}0.2 \cdot {\rm log}_2 \hspace{0.05cm} 5 \hspace{-0.05cm}=\hspace{-0.05cm}
 
0.8 \cdot{\rm log}_2\hspace{0.05cm}5\hspace{-0.05cm} - \hspace{-0.05cm}0.8 \cdot {\rm log}_2\hspace{0.05cm}4 \hspace{-0.05cm}+ \hspace{-0.05cm}0.2 \cdot {\rm log}_2 \hspace{0.05cm} 5 \hspace{-0.05cm}=\hspace{-0.05cm}
 
{\rm log}_2\hspace{0.05cm}5\hspace{-0.05cm} -\hspace{-0.05cm} 0.8 \cdot  
 
{\rm log}_2\hspace{0.05cm}5\hspace{-0.05cm} -\hspace{-0.05cm} 0.8 \cdot  
 
{\rm log}_2\hspace{0.1cm}4\hspace{-0.05cm} = \hspace{-0.05cm} \frac{{\rm lg} \hspace{0.1cm}5}{{\rm lg}\hspace{0.1cm}2} \hspace{-0.05cm}-\hspace{-0.05cm} 1.6 \hspace{0.15cm}  
 
{\rm log}_2\hspace{0.1cm}4\hspace{-0.05cm} = \hspace{-0.05cm} \frac{{\rm lg} \hspace{0.1cm}5}{{\rm lg}\hspace{0.1cm}2} \hspace{-0.05cm}-\hspace{-0.05cm} 1.6 \hspace{0.15cm}  
\underline {= 0.722\,{\rm bit/Anfrage}}\hspace{0.05cm}.$$
+
\underline {= 0.722\,{\rm bit/enquiry}}\hspace{0.05cm}.$$
  
  
'''(3)'''&nbsp; In der Datei&nbsp; &bdquo;Angenehm&rdquo;&nbsp; sind die Wahrscheinlichkeiten gegenüber der Datei&nbsp; &bdquo;Regenloch&rdquo;&nbsp; genau vertauscht.&nbsp; Durch diese Vertauschung wird die Entropie jedoch nicht verändert:
+
'''(3)'''&nbsp; In the file&nbsp; "Enjoyable"&nbsp; the probabilities are exactly swapped compared to the file&nbsp; "Rainy"&nbsp;.&nbsp; However, this swap does not change the entropy:
:$$H_{\rm A} = H_{\rm R} \hspace{0.15cm} \underline {= 0.722\,{\rm bit/Anfrage}}\hspace{0.05cm}.$$
+
:$$H_{\rm E} = H_{\rm R} \hspace{0.15cm} \underline {= 0.722\,{\rm bit/enquiry}}\hspace{0.05cm}.$$
  
  
'''(4)'''&nbsp; Mit&nbsp; $p_{\rm B} = 1/30$&nbsp; und&nbsp; $p_{\rm G} =29/30$&nbsp; ergeben sich folgende Informationsgehalte:
+
'''(4)'''&nbsp; With&nbsp; $p_{\rm B} = 1/30$&nbsp; and&nbsp; $p_{\rm G} =29/30$,&nbsp; the information contents are as follows:
 
:$$I_{\rm B} \hspace{0.1cm}  =  \hspace{0.1cm}  {\rm log}_2\hspace{0.1cm}30 =   
 
:$$I_{\rm B} \hspace{0.1cm}  =  \hspace{0.1cm}  {\rm log}_2\hspace{0.1cm}30 =   
 
  \frac{{\rm lg}\hspace{0.1cm}30}{{\rm lg}\hspace{0.1cm}2}  = \frac{1.477}{0.301} \hspace{0.15cm}  
 
  \frac{{\rm lg}\hspace{0.1cm}30}{{\rm lg}\hspace{0.1cm}2}  = \frac{1.477}{0.301} \hspace{0.15cm}  
\underline {= 4.907\,{\rm bit/Anfrage}}\hspace{0.05cm},$$
+
\underline {= 4.907\,{\rm bit/enquiry}}\hspace{0.05cm},$$
 
:$$I_{\rm G} \hspace{0.1cm}  =  \hspace{0.1cm}  {\rm log}_2\hspace{0.1cm}\frac{30}{29} =   
 
:$$I_{\rm G} \hspace{0.1cm}  =  \hspace{0.1cm}  {\rm log}_2\hspace{0.1cm}\frac{30}{29} =   
 
  \frac{{\rm lg}\hspace{0.1cm}1.034}{{\rm lg}\hspace{0.1cm}2}  = \frac{1.477}{0.301} \hspace{0.15cm}  
 
  \frac{{\rm lg}\hspace{0.1cm}1.034}{{\rm lg}\hspace{0.1cm}2}  = \frac{1.477}{0.301} \hspace{0.15cm}  
\underline {= 0.049\,{\rm bit/Anfrage}}\hspace{0.05cm}.$$
+
\underline {= 0.049\,{\rm bit/enquiry}}\hspace{0.05cm}.$$
  
  
'''(5)'''&nbsp; Die Entropie&nbsp; $H_{\rm P}$&nbsp; ist der mittlere Informationsgehalt der beiden Ereignisse&nbsp; $\rm B$&nbsp; und&nbsp; $\rm G$:
+
'''(5)'''&nbsp; The entropy&nbsp; $H_{\rm P}$&nbsp; is the average information content of the two events&nbsp; $\rm B$&nbsp; and&nbsp; $\rm G$:
 
:$$H_{\rm P} = \frac{1}{30} \cdot 4.907 + \frac{29}{30} \cdot 0.049 = 0.164 + 0.047   
 
:$$H_{\rm P} = \frac{1}{30} \cdot 4.907 + \frac{29}{30} \cdot 0.049 = 0.164 + 0.047   
 
  \hspace{0.15cm}  
 
  \hspace{0.15cm}  
\underline {= 0.211\,{\rm bit/Anfrage}}\hspace{0.05cm}.$$
+
\underline {= 0.211\,{\rm bit/enquiry}}\hspace{0.05cm}.$$
*Obwohl&nbsp; (genauer:&nbsp; weil)&nbsp; das Ereignis&nbsp; $\rm B$&nbsp; seltener auftritt als&nbsp; $\rm G$, ist sein Beitrag zur Entropie größer.
+
*Although&nbsp; (more precisely:&nbsp; because)&nbsp; event&nbsp; $\rm B$&nbsp; occurs less frequently than&nbsp; $\rm G$, its contribution to entropy is much greater.
  
  
'''(6)'''&nbsp; Richtig sind die <u>Aussagen 1 und 3</u>:
+
'''(6)'''&nbsp; Statements <u>1 and 3</u> are correct:
*$\rm B$&nbsp; und&nbsp; $\rm G$&nbsp; sind bei der Datei &bdquo;Unbekannt&rdquo; tatsächlich gleichwahrscheinlich: &nbsp; Die&nbsp; $60$&nbsp; dargestellten Symbole teilen sich auf in&nbsp;  $30$&nbsp;mal&nbsp; $\rm B$&nbsp; und&nbsp; $30$&nbsp;mal&nbsp; $\rm G$.  
+
*$\rm B$&nbsp; and&nbsp; $\rm G$&nbsp; are indeed equally probable in the&nbsp; "unknown"&nbsp; file: &nbsp; The&nbsp; $60$&nbsp; symbols shown divide into&nbsp;  $30$&nbsp;times&nbsp; $\rm B$&nbsp; and&nbsp; $30$&nbsp;times&nbsp; $\rm G$.  
*Es bestehen nun aber starke statistische Bindungen innerhalb der zeitlichen Folge.&nbsp; Nach längeren Schönwetterperioden folgen meist viele schlechte Tage am Stück.
+
*However, there are now strong statistical ties within the temporal sequence.&nbsp; Long periods of good weather are usually followed by many bad days in a row.
*Aufgrund dieser statistischen Abhängigkeit innerhalb der&nbsp; $\rm B/G$&ndash;Folge ist&nbsp; $H_\text{U} = 0.722 \; \rm bit/Anfrage$&nbsp; kleiner als&nbsp; $H_\text{D} = 1 \; \rm bit/Anfrage$.  
+
*Because of this statistical dependence within the&nbsp; $\rm B/G$&nbsp; sequence&nbsp; $H_\text{U} = 0.722 \; \rm bit/enquiry$&nbsp; is smaller than&nbsp; $H_\text{M} = 1 \; \rm bit/enquiry$.  
*$H_\text{D}$&nbsp; ist gleichzeitig das Maximum für&nbsp; $M = 2$ &nbsp;  &#8658; &nbsp; die letzte Aussage ist mit Sicherheit falsch.  
+
*$H_\text{M}$&nbsp; is at the same time the maximum for&nbsp; $M = 2$ &nbsp;  &#8658; &nbsp; the last statement is certainly wrong.
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  
  
  
[[Category:Information Theory: Exercises|^1.1 Gedächtnislose Nachrichtenquellen^]]
+
[[Category:Information Theory: Exercises|^1.1 Memoryless Sources^]]

Latest revision as of 12:59, 10 August 2021

Five different binary sources

A weather station queries different regions every day and receives a message  $x$  back as a response in each case, namely

  • $x = \rm B$:   The weather is rather bad.
  • $x = \rm G$:   The weather is rather good.


The data were stored in files over many years for different regions, so that the entropies of the  $\rm B/G$–sequences can be determined:

$$H = p_{\rm B} \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{p_{\rm B}} + p_{\rm G} \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{p_{\rm G}}$$

with the  base-2 logarithm

$${\rm log}_2\hspace{0.1cm}p=\frac{{\rm lg}\hspace{0.1cm}p}{{\rm lg}\hspace{0.1cm}2}.$$

Here,  "lg"  denotes the logarithm to the base  $10$.  It should also be mentioned that the pseudo-unit  $\text{bit/enquiry}$  must be added in each case.

The graph shows these binary sequences for  $60$  days and the following regions:

  • Region  "Mixed":              $p_{\rm B} = p_{\rm G} =0.5$,
  • Region  "Rainy":              $p_{\rm B} = 0.8, \; p_{\rm G} =0.2$,
  • Region  "Enjoyable":     $p_{\rm B} = 0.2, \; p_{\rm G} =0.8$,
  • Region  "Paradise":        $p_{\rm B} = 1/30, \; p_{\rm G} =29/30$.


Finally, the file  "Unknown"  is also given, whose statistical properties are to be estimated.





Hinss:

  • For the first four files it is assumed that the events  $\rm B$  and  $\rm G$  are statistically independent, a rather unrealistic assumption for weather practice.
  • The task was designed at a time when  Greta  was just starting school.  We leave it to you to rename  "Paradise"  to  "Hell".



Questions

1

What is the entropy  $H_{\rm M}$  of the file  "Mixed"?

$H_{\rm M}\ = \ $

$\ \rm bit/enquiry$

2

What is the entropy  $H_{\rm R}$  of the file  "Rainy"?

$H_{\rm R}\ = \ $

$\ \rm bit/enquiry$

3

What is the entropy  $H_{\rm E}$  of the file  "Enjoyable"?

$H_{\rm E}\ = \ $

$\ \rm bit/enquiry$

4

How large are the information contents of events  $\rm B$  and  $\rm G$  in relation to the file  "Paradise"?

$I_{\rm B}\ = \ $

$\ \rm bit/enquiry$
$I_{\rm G}\ = \ $

$\ \rm bit/enquiry$

5

What is the entropy  (that is:  the average information content)  $H_{\rm P}$  of the file  "paradise"?  Interpret the result.

$H_{\rm P}\ = \ $

$\ \rm bit/enquiry$

6

Which statements could be true for the file  "Unknown"?

Events  $\rm B$  and  $\rm G$  are approximately equally probable.
The sequence elements are statistically independent of each other.
The entropy of this file is  $H_\text{U} \approx 0.7 \; \rm bit/enquiry$.
The entropy of this file is  $H_\text{U} = 1.5 \; \rm bit/enquiry$.


Solution

(1)  For the file  "Mixed"  the two probabilities are the same:   $p_{\rm B} = p_{\rm G} =0.5$.  This gives us for the entropy:

$$H_{\rm M} = 0.5 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.5} + 0.5 \cdot {\rm log}_2\hspace{0.1cm}\frac{1}{0.5} \hspace{0.15cm}\underline {= 1\,{\rm bit/enquiry}}\hspace{0.05cm}.$$


(2)  With  $p_{\rm B} = 0.8$  and  $p_{\rm G} =0.2$,  a smaller entropy value is obtained:

$$H_{\rm R} \hspace{-0.05cm}= \hspace{-0.05cm}0.8 \cdot {\rm log}_2\hspace{0.05cm}\frac{5}{4} \hspace{-0.05cm}+ \hspace{-0.05cm}0.2 \cdot {\rm log}_2\hspace{0.05cm}\frac{5}{1}\hspace{-0.05cm}=\hspace{-0.05cm} 0.8 \cdot{\rm log}_2\hspace{0.05cm}5\hspace{-0.05cm} - \hspace{-0.05cm}0.8 \cdot {\rm log}_2\hspace{0.05cm}4 \hspace{-0.05cm}+ \hspace{-0.05cm}0.2 \cdot {\rm log}_2 \hspace{0.05cm} 5 \hspace{-0.05cm}=\hspace{-0.05cm} {\rm log}_2\hspace{0.05cm}5\hspace{-0.05cm} -\hspace{-0.05cm} 0.8 \cdot {\rm log}_2\hspace{0.1cm}4\hspace{-0.05cm} = \hspace{-0.05cm} \frac{{\rm lg} \hspace{0.1cm}5}{{\rm lg}\hspace{0.1cm}2} \hspace{-0.05cm}-\hspace{-0.05cm} 1.6 \hspace{0.15cm} \underline {= 0.722\,{\rm bit/enquiry}}\hspace{0.05cm}.$$


(3)  In the file  "Enjoyable"  the probabilities are exactly swapped compared to the file  "Rainy" .  However, this swap does not change the entropy:

$$H_{\rm E} = H_{\rm R} \hspace{0.15cm} \underline {= 0.722\,{\rm bit/enquiry}}\hspace{0.05cm}.$$


(4)  With  $p_{\rm B} = 1/30$  and  $p_{\rm G} =29/30$,  the information contents are as follows:

$$I_{\rm B} \hspace{0.1cm} = \hspace{0.1cm} {\rm log}_2\hspace{0.1cm}30 = \frac{{\rm lg}\hspace{0.1cm}30}{{\rm lg}\hspace{0.1cm}2} = \frac{1.477}{0.301} \hspace{0.15cm} \underline {= 4.907\,{\rm bit/enquiry}}\hspace{0.05cm},$$
$$I_{\rm G} \hspace{0.1cm} = \hspace{0.1cm} {\rm log}_2\hspace{0.1cm}\frac{30}{29} = \frac{{\rm lg}\hspace{0.1cm}1.034}{{\rm lg}\hspace{0.1cm}2} = \frac{1.477}{0.301} \hspace{0.15cm} \underline {= 0.049\,{\rm bit/enquiry}}\hspace{0.05cm}.$$


(5)  The entropy  $H_{\rm P}$  is the average information content of the two events  $\rm B$  and  $\rm G$:

$$H_{\rm P} = \frac{1}{30} \cdot 4.907 + \frac{29}{30} \cdot 0.049 = 0.164 + 0.047 \hspace{0.15cm} \underline {= 0.211\,{\rm bit/enquiry}}\hspace{0.05cm}.$$
  • Although  (more precisely:  because)  event  $\rm B$  occurs less frequently than  $\rm G$, its contribution to entropy is much greater.


(6)  Statements 1 and 3 are correct:

  • $\rm B$  and  $\rm G$  are indeed equally probable in the  "unknown"  file:   The  $60$  symbols shown divide into  $30$ times  $\rm B$  and  $30$ times  $\rm G$.
  • However, there are now strong statistical ties within the temporal sequence.  Long periods of good weather are usually followed by many bad days in a row.
  • Because of this statistical dependence within the  $\rm B/G$  sequence  $H_\text{U} = 0.722 \; \rm bit/enquiry$  is smaller than  $H_\text{M} = 1 \; \rm bit/enquiry$.
  • $H_\text{M}$  is at the same time the maximum for  $M = 2$   ⇒   the last statement is certainly wrong.