Difference between revisions of "Aufgaben:Exercise 1.1: Entropy of the Weather"
Line 39: | Line 39: | ||
*For the first four files it is assumed that the events B and G are statistically independent, a rather unrealistic assumption for weather practice. | *For the first four files it is assumed that the events B and G are statistically independent, a rather unrealistic assumption for weather practice. | ||
− | *The task was designed at a time when [https://en.wikipedia.org/wiki/Greta_Thunberg Greta] was just starting school. We leave it to you to rename | + | *The task was designed at a time when [https://en.wikipedia.org/wiki/Greta_Thunberg Greta] was just starting school. We leave it to you to rename "Paradise" tonbsp; "Hell". |
Revision as of 14:46, 28 May 2021
A weather station queries different regions every day and receives a message x back as a response in each case, namely
- x=B: The weather is rather bad.
- x=G: The weather is rather good.
The data were stored in files over many years for different regions, so that the entropies of the B/G–sequences can be determined:
- H=pB⋅log21pB+pG⋅log21pG
with the "Logarithm dualis"
- log2p=lgplg2(=ldp).
Here, "lg" denotes the logarithm to the base 10. It should also be mentioned that the pseudo-unit bit/enquiry must be added in each case.
The graph shows these binary sequences for 60 days and the following regions:
- Region "Mixed": pB=pG=0.5,
- Region "Rainy": pB=0.8,pG=0.2,
- Region "Enjoyable": pB=0.2,pG=0.8,
- Region "Paradise": pB=1/30,pG=29/30.
Finally, the file "Unknown" is also given, whose statistical properties are to be estimated.
Hinss:
- This task belongs to the chapter Discrete Memoryless Sources.
- For the first four files it is assumed that the events B and G are statistically independent, a rather unrealistic assumption for weather practice.
- The task was designed at a time when Greta was just starting school. We leave it to you to rename "Paradise" tonbsp; "Hell".
Questions
Solution
- HM=0.5⋅log210.5+0.5⋅log210.5=1bit/enquiry_.
(2) With pB=0.8 and pG=0.2, a smaller entropy value is obtained:
- HR=0.8⋅log254+0.2⋅log251=0.8⋅log25−0.8⋅log24+0.2⋅log25=log25−0.8⋅log24=lg5lg2−1.6=0.722bit/enquiry_.
(3) In the file "Enjoyable" the probabilities are exactly swapped compared to the file "Rainy" . However, this swap does not change the entropy:
- HE=HR=0.722bit/enquiry_.
(4) With pB=1/30 and pG=29/30, the information contents are as follows:
- IB=log230=lg30lg2=1.4770.301=4.907bit/enquiry_,
- IG=log23029=lg1.034lg2=1.4770.301=0.049bit/enquiry_.
(5) The entropy HP is the average information content of the two events B and G:
- HP=130⋅4.907+2930⋅0.049=0.164+0.047=0.211bit/enquiry_.
- Although (more precisely: because) event B occurs less frequently than G, its contribution to entropy is greater.
(6) Statements 1 and 3 are correct:
- B and G are indeed equally probable in the "unknown" file: the 60 symbols shown divide into 30 times B and 30 times G.
- However, there are now strong statistical ties within the temporal sequence. Long periods of good weather are usually followed by many bad days in a row.
- Because of this statistical dependence within the B/G sequence HU=0.722bit/enquiry is smaller than HM=1bit/enquiry.
- HM is at the same time the maximum for M=2 ⇒ the last statement is certainly wrong.