Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js

Difference between revisions of "Aufgaben:Exercise 1.1: Entropy of the Weather"

From LNTwww
Line 39: Line 39:
 
*For the first four files it is assumed that the events  B  and  G  are statistically independent, a rather unrealistic assumption for weather practice.
 
*For the first four files it is assumed that the events  B  and  G  are statistically independent, a rather unrealistic assumption for weather practice.
  
*The task was designed at a time when  [https://en.wikipedia.org/wiki/Greta_Thunberg Greta]  was just starting school.  We leave it to you to rename „paradise” to „hell”.  
+
*The task was designed at a time when  [https://en.wikipedia.org/wiki/Greta_Thunberg Greta]  was just starting school.  We leave it to you to rename  "Paradise"  tonbsp; "Hell".  
  
  

Revision as of 14:46, 28 May 2021

Five different binary sources

A weather station queries different regions every day and receives a message  x  back as a response in each case, namely

  • x=B:   The weather is rather bad.
  • x=G:   The weather is rather good.


The data were stored in files over many years for different regions, so that the entropies of the  B/G–sequences can be determined:

H=pBlog21pB+pGlog21pG

with the  "Logarithm dualis"

log2p=lgplg2(=ldp).

Here,  "lg"  denotes the logarithm to the base  10.  It should also be mentioned that the pseudo-unit  bit/enquiry  must be added in each case.

The graph shows these binary sequences for  60  days and the following regions:

  • Region  "Mixed":              pB=pG=0.5,
  • Region  "Rainy":              pB=0.8,pG=0.2,
  • Region  "Enjoyable":     pB=0.2,pG=0.8,
  • Region  "Paradise":        pB=1/30,pG=29/30.


Finally, the file  "Unknown"  is also given, whose statistical properties are to be estimated.





Hinss:

  • For the first four files it is assumed that the events  B  and  G  are statistically independent, a rather unrealistic assumption for weather practice.
  • The task was designed at a time when  Greta  was just starting school.  We leave it to you to rename  "Paradise"  tonbsp; "Hell".



Questions

1

What is the entropy  HM  of the file  "Mixed"?

HM = 

 bit/enquiry

2

What is the entropy  HR  of the file  "Rainy"?

HR = 

 bit/enquiry

3

What is the entropy  HE  of the file  "Enjoyable"?

HE = 

 bit/enquiry

4

How large are the information contents of events  B  and  G  in relation to the file  "Paradise"?

IB = 

 bit/enquiry
IG = 

 bit/enquiry

5

What is the entropy  (that is:  the average information content)  HP  of the file  „paradise”?  Interpret the result.

HP = 

 bit/enquiry

6

Which statements could be true for the file  "Unknown&"?

Events  B  and  G  are approximately equally probable.
The sequence elements are statistically independent of each other.
The entropy of this file is  HU0.7bit/enquiry.
The entropy of this file is  HU=1.5bit/enquiry.


Solution

(1)  For the file  "Mixed"  the two probabilities are the same:   pB=pG=0.5.  This gives us for the entropy:

HM=0.5log210.5+0.5log210.5=1bit/enquiry_.


(2)  With  pB=0.8  and  pG=0.2,  a smaller entropy value is obtained:

HR=0.8log254+0.2log251=0.8log250.8log24+0.2log25=log250.8log24=lg5lg21.6=0.722bit/enquiry_.


(3)  In the file  "Enjoyable"  the probabilities are exactly swapped compared to the file  "Rainy" .  However, this swap does not change the entropy:

HE=HR=0.722bit/enquiry_.


(4)  With  pB=1/30  and  pG=29/30,  the information contents are as follows:

IB=log230=lg30lg2=1.4770.301=4.907bit/enquiry_,
IG=log23029=lg1.034lg2=1.4770.301=0.049bit/enquiry_.


(5)  The entropy  HP  is the average information content of the two events  B  and  G:

HP=1304.907+29300.049=0.164+0.047=0.211bit/enquiry_.
  • Although  (more precisely:  because)  event  B  occurs less frequently than  G, its contribution to entropy is greater.


(6)  Statements 1 and 3 are correct:

  • B  and  G  are indeed equally probable in the  "unknown"  file:   the  60  symbols shown divide into  30 times  B  and  30 times  G.
  • However, there are now strong statistical ties within the temporal sequence.  Long periods of good weather are usually followed by many bad days in a row.
  • Because of this statistical dependence within the  B/G  sequence  HU=0.722bit/enquiry  is smaller than  HM=1bit/enquiry.
  • HM  is at the same time the maximum for  M=2   ⇒   the last statement is certainly wrong.