Difference between revisions of "Aufgaben:Exercise 2.10: Shannon-Fano Coding"

From LNTwww
(14 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Weitere Quellencodierverfahren
+
{{quiz-Header|Buchseite=Information_Theory/Further_Source_Coding_Methods
 
}}
 
}}
  
[[File:P_ID2465__Inf_A_2_10.png |right|frame|Baumdiagramm der <br>Shannon–Fano–Codierung]]
+
[[File:P_ID2465__Inf_A_2_10.png |right|frame|Tree diagram of the <br>Shannon-Fano coding]]
Ein weiterer Algorithmus zur Entropiecodierung wurde 1949 von&nbsp; [https://de.wikipedia.org/wiki/Claude_Shannon Claude Elwood Shannon]&nbsp; und&nbsp; [https://de.wikipedia.org/wiki/Robert_Fano Robert Fano]&nbsp; angegeben, der im Theorieteil beschrieben ist.
+
Another algorithm for entropy coding was given in 1949 by&nbsp; [https://en.wikipedia.org/wiki/Claude_Shannon Claude Elwood Shannon]&nbsp; and&nbsp; [https://en.wikipedia.org/wiki/Robert_Fano Robert Fano],&nbsp; which is described in the theory section.
  
Diese spezielle Art von Quellencodierung soll hier an einem einfachen Beispiel für den Symbolumfang&nbsp; $M = 4$&nbsp; und die folgenden Symbolwahrscheinlichkeiten beschrieben werden:
+
This special type of source coding will be described here using a simple example for the symbol set size&nbsp; $M = 4$&nbsp; and the following symbol probabilities:
  
 
:$$p_{\rm A} = 0.2 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm B}= 0.3 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm C}= 0.4 \hspace{0.05cm},\hspace{0.4cm}
 
:$$p_{\rm A} = 0.2 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm B}= 0.3 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm C}= 0.4 \hspace{0.05cm},\hspace{0.4cm}
 
p_{\rm D}= 0.1 \hspace{0.05cm}. $$
 
p_{\rm D}= 0.1 \hspace{0.05cm}. $$
  
Die Grafik zeigt das dazugehörige Baumdiagramm.&nbsp; Man geht folgendermaßen vor:
+
The graph shows the corresponding tree diagram.&nbsp; Proceed as follows:
  
:1. Man ordnet die Symbole nach fallender Auftrittswahrscheinlichkeit, hier&nbsp; $\rm C$ &ndash; $\rm B$ &ndash; $\rm A$ &ndash; $\rm D$.
+
:1. Order the symbols according to decreasing probability of occurrence, here&nbsp; $\rm C$ &ndash; $\rm B$ &ndash; $\rm A$ &ndash; $\rm D$.
  
:2. Man teilt die Symbole in zwei etwa gleichwahrscheinliche Gruppen ein, hier&nbsp; $\rm C$&nbsp; und&nbsp; $\rm BAD$.
+
:2. Divide the symbols into two groups of approximately equal probability, here&nbsp; $\rm C$&nbsp; and&nbsp; $\rm BAD$.
  
:3. Der unwahrscheinlicheren Gruppe wird das Binärsymbol&nbsp; <b>0</b>, der anderen Gruppe  die&nbsp; <b>1</b>&nbsp; zugeordnet.
+
:3. The binary symbol&nbsp; <b>0</b>&nbsp; is assigned to the less probable group,&nbsp; <b>1</b>&nbsp; to the other group.
  
:4. Sind in einer Gruppe mehr als ein Zeichen, so ist der Algorithmus rekursiv anzuwenden.
+
:4. If there is more than one symbol in a group, the algorithm is to be applied recursively.
  
Für dieses Beispiel ergibt sich die folgende Codezuordnung (im Baumdiagramm markiert eine rote Verbindung eine&nbsp; <b>1</b>&nbsp; und eine blaue eine&nbsp; <b>0</b>:
+
For this example, the following code assignment results&nbsp; (in the tree diagram, a red connection marks a&nbsp; <b>1</b>&nbsp; and a blue one a&nbsp; <b>0</b>):
  
 
: &nbsp; &nbsp; &nbsp; $\rm A$ &nbsp; &#8594; &nbsp; <b>111</b>, &nbsp; &nbsp; $\rm B$ &nbsp; &#8594; &nbsp; <b>10</b>, &nbsp; &nbsp; $\rm C$ &nbsp; &#8594; &nbsp; <b>0</b>, &nbsp; &nbsp; $\rm D$ &nbsp; &#8594; &nbsp; <b>110</b>.
 
: &nbsp; &nbsp; &nbsp; $\rm A$ &nbsp; &#8594; &nbsp; <b>111</b>, &nbsp; &nbsp; $\rm B$ &nbsp; &#8594; &nbsp; <b>10</b>, &nbsp; &nbsp; $\rm C$ &nbsp; &#8594; &nbsp; <b>0</b>, &nbsp; &nbsp; $\rm D$ &nbsp; &#8594; &nbsp; <b>110</b>.
  
Damit ergibt sich für die mittlere Codewortlänge:
+
This gives the following for the average codeword length:
:$$L_{\rm M} = 0.4 \cdot 1 + 0.3 \cdot 2 + (0.2 + 0.1) \cdot 3 = 1.9\,\,{\rm bit/Quellensymbol}\hspace{0.05cm}.$$
+
:$$L_{\rm M} = 0.4 \cdot 1 + 0.3 \cdot 2 + (0.2 + 0.1) \cdot 3 = 1.9\,\,{\rm bit/source\hspace{0.15cm}symbol}\hspace{0.05cm}.$$
  
Der Huffman-Algorithmus würde hier zwar einen geringfügig anderen Code erzeugen, aber auch bei diesem würde&nbsp;  
+
The Huffman algorithm would produce a slightly different code here, but even in this case&nbsp;  
*$\rm C$&nbsp; mit einem Bit,&nbsp;  
+
*$\rm C$&nbsp; is encoded with one bit,&nbsp;  
*$\rm B$&nbsp; mit zwei Bit und&nbsp;  
+
*$\rm B$&nbsp; with two bits and &nbsp;  
*$\rm A$ und&nbsp; $\rm D$&nbsp; mit jeweils drei Bit&nbsp;  
+
*$\rm A$ and&nbsp; $\rm D$&nbsp; with three bits each.&nbsp;  
  
  
codiert.&nbsp; Damit ergäbe sich ebenfalls&nbsp; $L_{\rm M} = 1.9 \ \rm bit/Quellensymbol$.
+
&nbsp; This would also result in&nbsp; $L_{\rm M} = 1.9 \ \rm bit/source\hspace{0.15cm}symbol$.
  
In dieser Aufgabe sollen Sie den Shannon&ndash;Fano&ndash;Code für&nbsp; $M = 8$&nbsp; und die Wahrscheinlichkeiten
+
In this task you are to calculate the Shannon-Fano code for&nbsp; $M = 8$&nbsp; and the probabilities
 
:$$p_{\rm A} = 0.10 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm B}= 0.40 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm C}= 0.02 \hspace{0.05cm},\hspace{0.4cm} p_{\rm D}= 0.14 \hspace{0.05cm},\hspace{0.4cm}  
 
:$$p_{\rm A} = 0.10 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm B}= 0.40 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm C}= 0.02 \hspace{0.05cm},\hspace{0.4cm} p_{\rm D}= 0.14 \hspace{0.05cm},\hspace{0.4cm}  
p_{\rm E} = 0.17 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm F}= 0.03 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm G}= 0.05 \hspace{0.05cm},\hspace{0.4cm}p_{\rm H}= 0.09$$
+
p_{\rm E} = 0.17 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm F}= 0.03 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm G}= 0.05 \hspace{0.05cm},\hspace{0.4cm}p_{\rm H}= 0.09.$$
ermitteln.&nbsp; Sie werden erkennen, dass sich mit diesen Wahrscheinlichkeiten &bdquo;Shannon&ndash;Fano&rdquo; auch hinsichtlich der Effizienz von &bdquo;Huffman&rdquo; unterscheiden wird.&nbsp;  
+
You will see that with these probabilities&nbsp; "Shannon-Fano"&nbsp; will also differ from&nbsp; "Huffman"&nbsp; in terms of efficiency.&nbsp;  
  
Beim Huffman&ndash;Code ergibt sich mit den vorliegenden Wahrscheinlichkeiten die folgende Zuordnung:
+
With the Huffman code, the following assignment results with the probabilities at hand:
  
 
: &nbsp; &nbsp; &nbsp; $\rm A$ &nbsp; &#8594; &nbsp; <b>100</b>, &nbsp; &nbsp;  $\rm B$ &nbsp; &#8594; &nbsp; <b>0</b>, &nbsp; &nbsp;  $\rm C$ &nbsp; &#8594; &nbsp; <b>111100</b>, &nbsp; &nbsp; $\rm D$ &nbsp; &#8594; &nbsp; <b>101</b>, &nbsp; &nbsp; $\rm E$ &nbsp; &#8594; &nbsp; <b>110</b>,  &nbsp; &nbsp; $\rm F$ &nbsp; &#8594; &nbsp; <b>111101</b>, &nbsp; &nbsp; $\rm G$ &nbsp; &#8594; &nbsp; <b>11111</b>, &nbsp; &nbsp; $\rm H$ &nbsp; &#8594; &nbsp; <b>1110</b>.
 
: &nbsp; &nbsp; &nbsp; $\rm A$ &nbsp; &#8594; &nbsp; <b>100</b>, &nbsp; &nbsp;  $\rm B$ &nbsp; &#8594; &nbsp; <b>0</b>, &nbsp; &nbsp;  $\rm C$ &nbsp; &#8594; &nbsp; <b>111100</b>, &nbsp; &nbsp; $\rm D$ &nbsp; &#8594; &nbsp; <b>101</b>, &nbsp; &nbsp; $\rm E$ &nbsp; &#8594; &nbsp; <b>110</b>,  &nbsp; &nbsp; $\rm F$ &nbsp; &#8594; &nbsp; <b>111101</b>, &nbsp; &nbsp; $\rm G$ &nbsp; &#8594; &nbsp; <b>11111</b>, &nbsp; &nbsp; $\rm H$ &nbsp; &#8594; &nbsp; <b>1110</b>.
Line 54: Line 54:
  
  
''Hinweise:''
+
Hints:
*Die Aufgabe gehört zum  Kapitel&nbsp; [[Informationstheorie/Weitere_Quellencodierverfahren|Weitere Quellencodierverfahren]].
+
*The assignment belongs to the chapter&nbsp; [[Information_Theory/Weitere_Quellencodierverfahren|Further source coding methods]].
*Insbesondere wird  Bezug genommen auf die Seite&nbsp; [[Informationstheorie/Weitere_Quellencodierverfahren#Der_Shannon.E2.80.93Fano.E2.80.93Algorithmus|Der Shannon-Fano-Algorithmus]].
+
*In particular, reference is made to the page&nbsp; [[Information_Theory/Weitere_Quellencodierverfahren#The_Shannon-Fano_algorithm|The Shannon-Fano algorithm]].
*Zur Kontrolle Ihrer Ergebnisse verweisen wir auf das interaktive Applet&nbsp; [[Applets:Huffman-_und_Shannon-Fano-Codierung|Huffman&ndash; und Shannon&ndash;Fano&ndash;Codierung]].
+
*To check your results, please refer to the (German language) SWF module&nbsp; [[Applets:Huffman_Shannon_Fano|Coding according to Huffman and Shannon/Fano]].
 
   
 
   
  
  
  
===Fragebogen===
+
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
{Wie groß ist die mittlere Codewortlänge&nbsp; $L_{\rm M}$&nbsp; beim&nbsp; <u>Huffman&ndash;Code</u>?
+
{What is the average codeword length&nbsp; $L_{\rm M}$&nbsp; for the&nbsp; <u>Huffman code</u>?
 
|type="{}"}
 
|type="{}"}
$L_{\rm M}\ = \ $ { 2.54 3% } $\ \rm bit/Quellensymbol$
+
$L_{\rm M}\ = \ $ { 2.54 3% } $\ \rm bit/source\hspace{0.15cm}symbol$
  
  
{Was geschieht im ersten Schritt der&nbsp; <u>Shannon&ndash;Fano&ndash;Codierung</u>?&nbsp; ''Hinweis'':&nbsp; Alle anderen Symbole werden in der zweiten Gruppe zusammengefasst.
+
{What happens in the first step of&nbsp; <u>Shannon&ndash;Fano coding</u>?&nbsp; <u>Hint</u>:&nbsp; All other symbols are grouped together in the second group.
|type="[]"}
+
|type="()"}
- Man fasst&nbsp; $\rm A$&nbsp; und&nbsp; $\rm B$&nbsp; zur ersten Gruppe zusammen.
+
- &nbsp; $\rm A$&nbsp; and&nbsp; $\rm B$&nbsp; are combined into the first group.
+ Man fasst&nbsp; $\rm B$&nbsp; und&nbsp; $\rm E$&nbsp; zur ersten Gruppe zusammen.
+
+ &nbsp; $\rm B$&nbsp; and&nbsp; $\rm E$&nbsp; are combined into the first group.
- Die erste Gruppe besteht nur aus dem Symbol&nbsp; $\rm B$.
+
- The first group consists only of symbol&nbsp; $\rm B$.
  
  
{Welche Zuordnungen ergeben sich für den&nbsp; <u>Shannon&ndash;Fano&ndash;Algorithmus</u>?
+
{Which assignments result for the&nbsp; <u>Shannon&ndash;Fano algorithm</u>?
 
|type="[]"}
 
|type="[]"}
+ Das Zeichen&nbsp; $\rm A$&nbsp; wird binär mit&nbsp; <b>010</b>&nbsp; codiert.
+
+ The character&nbsp; $\rm A$&nbsp; is binary encoded with&nbsp; <b>010</b>&nbsp;.
+ Das Zeichen&nbsp; $\rm B$&nbsp; wird binär mit&nbsp; <b>11</b>&nbsp; codiert.
+
+ The character&nbsp; $\rm B$&nbsp; is binary encoded with&nbsp; <b>11</b>&nbsp;.
+ Das Zeichen&nbsp; $\rm C$&nbsp; wird binär mit&nbsp; <b>00110</b>&nbsp; codiert.
+
+ The character&nbsp; $\rm C$&nbsp; is binary encoded with&nbsp; <b>00110</b>&nbsp;.
  
  
{Wie groß ist die mittlere Codewortlänge&nbsp; $L_{\rm M}$&nbsp; beim&nbsp; <u>Shannon&ndash;Fano&ndash;Code</u>?
+
{What is the average codeword length&nbsp; $L_{\rm M}$&nbsp; for&nbsp; <u>Shannon&ndash;Fano code</u>?
 
|type="{}"}
 
|type="{}"}
$L_{\rm M}\ = \ $ { 2.58 3% } $\ \rm bit/Quellensymbol$
+
$L_{\rm M}\ = \ $ { 2.58 3% } $\ \rm bit/source\hspace{0.15cm}symbol$
  
  
{Welche Aussagen gelten für beliebige Wahrscheinlichkeiten?
+
{Which statements are true for arbitrary probabilities?
 
|type="[]"}
 
|type="[]"}
- $L_{\rm M}$&nbsp; könnte bei &bdquo;Shannon&ndash;Fano&rdquo; kleiner sein als bei &bdquo;Huffman&rdquo;.
+
- $L_{\rm M}$&nbsp; could be smaller for&nbsp; "Shannon&ndash;Fano"&nbsp; than for&nbsp; "Huffman".
+ $L_{\rm M}$&nbsp; könnte bei &bdquo;Shannon&ndash;Fano&rdquo; größer sein als bei &bdquo;Huffman&rdquo;.
+
+ $L_{\rm M}$&nbsp; could be larger at&nbsp; "Shannon&ndash;Fano"&nbsp; than at&nbsp; "Huffman".
+ $L_{\rm M}$&nbsp; könnte bei &bdquo;Shannon&ndash;Fano&rdquo; und &bdquo;Huffman&rdquo; gleich groß sein.
+
+ $L_{\rm M}$&nbsp; could be the same for&nbsp; "Shannon&ndash;Fano"&nbsp; and&nbsp; "Huffman".
  
  
Line 99: Line 99:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
'''(1)'''&nbsp; Für den angegebenen Huffman&ndash;Code erhält man:
+
'''(1)'''&nbsp; For the given Huffman code one obtains:
:$$L_{\rm M} = 0.4 \cdot 1 +  (0.17 + 0.14 + 0.10) \cdot 3  + 0.09 \cdot 4 + 0.05 \cdot 5 + (0.03 + 0.02) \cdot 6 =\underline{ 2.54 \,\,{\rm bit/Quellensymbol}}\hspace{0.05cm}. $$
+
:$$L_{\rm M} = 0.4 \cdot 1 +  (0.17 + 0.14 + 0.10) \cdot 3  + 0.09 \cdot 4 + 0.05 \cdot 5 + (0.03 + 0.02) \cdot 6 =\underline{ 2.54 \,\,{\rm bit/source\hspace{0.15cm}symbol}}\hspace{0.05cm}. $$
 +
 
 +
 
  
 +
'''(2)'''&nbsp; The correct <u>answer is 2</u>:
 +
*Before applying the Shannon-Fano algorithm, the characters must first be sorted according to their occurrence probabilities.&nbsp; Thus, answer 1 is incorrect.
 +
*All sorted characters must be divided into two groups in such a way that the group probabilities are as equal as possible. For the first step:
  
'''(2)'''&nbsp; Richtig ist die <u>Antwort 2</u>:
+
[[File:P_ID2466__Inf_A_2_10c.png|right|frame|Tree diagram of Shannon-Fano coding]]
*Vor Anwendung des Shannon&ndash;Fano&ndash;Algorithmus müssen die Zeichen erst nach ihren Auftrittswahrscheinlichkeiten sortiert werden. Damit ist die Antwort 1 falsch.
 
*Alle sortierten Zeichen müssen so in zwei Gruppen eingeteilt werden, dass die Gruppenwahrscheinlichkeiten möglichst gleich sind. Für den ersten Schritt:
 
 
:$${\rm Pr}(\boldsymbol{\rm BE})  = 0.57\hspace{0.05cm}, \hspace{0.2cm}{\rm Pr}(\boldsymbol{\rm DAHGFC})  = 0.43  \hspace{0.05cm}.$$
 
:$${\rm Pr}(\boldsymbol{\rm BE})  = 0.57\hspace{0.05cm}, \hspace{0.2cm}{\rm Pr}(\boldsymbol{\rm DAHGFC})  = 0.43  \hspace{0.05cm}.$$
*Bei der Aufteilung gemäß Lösungsvorschlag 3 würde die Gleichverteilung noch weniger erreicht:
+
*With the distribution according to solution suggestion 3, the equal distribution would be achieved even less:
 
:$${\rm Pr}(\boldsymbol{\rm B})  = 0.40\hspace{0.05cm}, \hspace{0.2cm}{\rm Pr}(\boldsymbol{\rm EDAHGFC})  = 0.60  \hspace{0.05cm}.$$
 
:$${\rm Pr}(\boldsymbol{\rm B})  = 0.40\hspace{0.05cm}, \hspace{0.2cm}{\rm Pr}(\boldsymbol{\rm EDAHGFC})  = 0.60  \hspace{0.05cm}.$$
  
  
[[File:P_ID2466__Inf_A_2_10c.png|right|frame|Baumdiagramm der Shannon–Fano–Codierung]]
+
'''(3)'''&nbsp; <u>All suggeted solutionss</u> are correct:
'''(3)'''&nbsp; <u>Alle Lösungsvorschläge</u> sind richtig:
+
*The graph shows the tree diagram of the Shannon-Fano coding.
*Die Grafik zeigt das Baumdiagramm der Shannon&ndash;Fano&ndash;Codierung.  
+
*This results in the following assignment (a red connection indicates a&nbsp; <b>1</b>&nbsp;, a blue one indicates a&nbsp; <b>0</b>):
*Daraus ergibt sich folgende Zuordnung (eine rote Verbindung weist auf <b>1</b> hin, eine blaue auf <b>0</b>):
+
: $\underline{\rm A}$ &nbsp; &#8594; &nbsp; <u><b>010</b></u>, &nbsp; &nbsp;  $\underline{\rm B}$ &nbsp; &#8594; &nbsp; <u><b>11</b></u>, &nbsp; &nbsp;  $\underline{\rm C}$ &nbsp; &#8594; &nbsp; <u><b>00110</b></u>,  
: $\underline{\rm A}$ &#8594; <u><b>010</b></u>,  
+
: ${\rm D}$ &nbsp; &#8594; &nbsp; <b>011</b>, &nbsp; &nbsp; ${\rm E}$ &nbsp; &#8594; &nbsp; <b>10</b>, &nbsp; &nbsp; ${\rm F}$ &nbsp; &#8594; &nbsp; <b>00111</b>, &nbsp; &nbsp; ${\rm G}$ &nbsp; &#8594; &nbsp; <b>0010</b>, &nbsp; &nbsp; ${\rm H}$ &nbsp; &#8594; &nbsp; <b>000</b>.
: $\underline{\rm B}$ &#8594; <u><b>11</b></u>,  
+
 
: $\underline{\rm C}$ &#8594; <u><b>00110</b></u>,  
+
 
: ${\rm D}$ &#8594; <b>011</b>, &nbsp; &nbsp; ${\rm E}$ &#8594; <b>10</b>, &nbsp; &nbsp; ${\rm F}$ &#8594; <b>00111</b>, &nbsp; &nbsp; ${\rm G}$ &#8594; <b>0010</b>, &nbsp; &nbsp; ${\rm H}$ &#8594; <b>000</b>.
+
 
<br clear=all>
+
'''(4)'''&nbsp; Using the result of sub-task&nbsp; '''(3)'''&nbsp;, we get:
'''(4)'''&nbsp; Mit dem Ergebnis der Teilaufgabe '''(3)''' erhält man:
+
:$$L_{\rm M}= (0.40 + 0.17) \cdot 2  + (0.14 + 0.10 + 0.09) \cdot 3 + 0.05 \cdot 4 + (0.03 + 0.02) \cdot 5 =\underline{ 2.58 \,\,{\rm bit/source\hspace{0.15cm}symbol}}\hspace{0.05cm}. $$
:$$L_{\rm M}= (0.40 + 0.17) \cdot 2  + (0.14 + 0.10 + 0.09) \cdot 3 + 0.05 \cdot 4 + (0.03 + 0.02) \cdot 5 =\underline{ 2.58 \,\,{\rm bit/Quellensymbol}}\hspace{0.05cm}. $$
+
 
  
  
'''(5)'''&nbsp; Richtig sind die <u>Aussagen 2 und 3</u>:
+
'''(5)'''&nbsp; <u>Statements 2 and 3</u> are correct:
*Im vorliegenden Beispiel ergibt sich bei Shannon&ndash;Fano ein ungünstigerer Wert als bei Huffman.  
+
*In the present example, Shannon-Fano results in a less favourable value than Huffman.
*In den meisten Fällen &ndash; so auch im Beispiel auf der Angabenseite &ndash; ergibt sich für Huffman und Shannon&ndash;Fano ein gleichwertiger Code und damit auch die gleiche mittlere Codewortlänge.  
+
*In most cases &ndash; including the example on the information page &ndash;&nbsp; "Huffman"&nbsp; and&nbsp; "Shannon-Fano"&nbsp; result in an equivalent code and thus also to the same average code word length.
*Einen effektiveren Code als Huffman liefert Shannon&ndash;Fano dagegen nie.
+
*Shannon-Fano, on the other hand, never delivers a more effective code than Huffman.
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  
  
  
[[Category:Aufgaben zu Informationstheorie|^2.4 Weitere Quellencodierverfahren^]]
+
[[Category:Information Theory: Exercises|^2.4 Further Source Coding Methods^]]

Revision as of 13:33, 12 August 2021

Tree diagram of the
Shannon-Fano coding

Another algorithm for entropy coding was given in 1949 by  Claude Elwood Shannon  and  Robert Fano,  which is described in the theory section.

This special type of source coding will be described here using a simple example for the symbol set size  $M = 4$  and the following symbol probabilities:

$$p_{\rm A} = 0.2 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm B}= 0.3 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm C}= 0.4 \hspace{0.05cm},\hspace{0.4cm} p_{\rm D}= 0.1 \hspace{0.05cm}. $$

The graph shows the corresponding tree diagram.  Proceed as follows:

1. Order the symbols according to decreasing probability of occurrence, here  $\rm C$ – $\rm B$ – $\rm A$ – $\rm D$.
2. Divide the symbols into two groups of approximately equal probability, here  $\rm C$  and  $\rm BAD$.
3. The binary symbol  0  is assigned to the less probable group,  1  to the other group.
4. If there is more than one symbol in a group, the algorithm is to be applied recursively.

For this example, the following code assignment results  (in the tree diagram, a red connection marks a  1  and a blue one a  0):

      $\rm A$   →   111,     $\rm B$   →   10,     $\rm C$   →   0,     $\rm D$   →   110.

This gives the following for the average codeword length:

$$L_{\rm M} = 0.4 \cdot 1 + 0.3 \cdot 2 + (0.2 + 0.1) \cdot 3 = 1.9\,\,{\rm bit/source\hspace{0.15cm}symbol}\hspace{0.05cm}.$$

The Huffman algorithm would produce a slightly different code here, but even in this case 

  • $\rm C$  is encoded with one bit, 
  • $\rm B$  with two bits and  
  • $\rm A$ and  $\rm D$  with three bits each. 


  This would also result in  $L_{\rm M} = 1.9 \ \rm bit/source\hspace{0.15cm}symbol$.

In this task you are to calculate the Shannon-Fano code for  $M = 8$  and the probabilities

$$p_{\rm A} = 0.10 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm B}= 0.40 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm C}= 0.02 \hspace{0.05cm},\hspace{0.4cm} p_{\rm D}= 0.14 \hspace{0.05cm},\hspace{0.4cm} p_{\rm E} = 0.17 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm F}= 0.03 \hspace{0.05cm}, \hspace{0.4cm}p_{\rm G}= 0.05 \hspace{0.05cm},\hspace{0.4cm}p_{\rm H}= 0.09.$$

You will see that with these probabilities  "Shannon-Fano"  will also differ from  "Huffman"  in terms of efficiency. 

With the Huffman code, the following assignment results with the probabilities at hand:

      $\rm A$   →   100,     $\rm B$   →   0,     $\rm C$   →   111100,     $\rm D$   →   101,     $\rm E$   →   110,     $\rm F$   →   111101,     $\rm G$   →   11111,     $\rm H$   →   1110.






Hints:



Questions

1

What is the average codeword length  $L_{\rm M}$  for the  Huffman code?

$L_{\rm M}\ = \ $

$\ \rm bit/source\hspace{0.15cm}symbol$

2

What happens in the first step of  Shannon–Fano codingHint:  All other symbols are grouped together in the second group.

  $\rm A$  and  $\rm B$  are combined into the first group.
  $\rm B$  and  $\rm E$  are combined into the first group.
The first group consists only of symbol  $\rm B$.

3

Which assignments result for the  Shannon–Fano algorithm?

The character  $\rm A$  is binary encoded with  010 .
The character  $\rm B$  is binary encoded with  11 .
The character  $\rm C$  is binary encoded with  00110 .

4

What is the average codeword length  $L_{\rm M}$  for  Shannon–Fano code?

$L_{\rm M}\ = \ $

$\ \rm bit/source\hspace{0.15cm}symbol$

5

Which statements are true for arbitrary probabilities?

$L_{\rm M}$  could be smaller for  "Shannon–Fano"  than for  "Huffman".
$L_{\rm M}$  could be larger at  "Shannon–Fano"  than at  "Huffman".
$L_{\rm M}$  could be the same for  "Shannon–Fano"  and  "Huffman".


Solution

(1)  For the given Huffman code one obtains:

$$L_{\rm M} = 0.4 \cdot 1 + (0.17 + 0.14 + 0.10) \cdot 3 + 0.09 \cdot 4 + 0.05 \cdot 5 + (0.03 + 0.02) \cdot 6 =\underline{ 2.54 \,\,{\rm bit/source\hspace{0.15cm}symbol}}\hspace{0.05cm}. $$


(2)  The correct answer is 2:

  • Before applying the Shannon-Fano algorithm, the characters must first be sorted according to their occurrence probabilities.  Thus, answer 1 is incorrect.
  • All sorted characters must be divided into two groups in such a way that the group probabilities are as equal as possible. For the first step:
Tree diagram of Shannon-Fano coding
$${\rm Pr}(\boldsymbol{\rm BE}) = 0.57\hspace{0.05cm}, \hspace{0.2cm}{\rm Pr}(\boldsymbol{\rm DAHGFC}) = 0.43 \hspace{0.05cm}.$$
  • With the distribution according to solution suggestion 3, the equal distribution would be achieved even less:
$${\rm Pr}(\boldsymbol{\rm B}) = 0.40\hspace{0.05cm}, \hspace{0.2cm}{\rm Pr}(\boldsymbol{\rm EDAHGFC}) = 0.60 \hspace{0.05cm}.$$


(3)  All suggeted solutionss are correct:

  • The graph shows the tree diagram of the Shannon-Fano coding.
  • This results in the following assignment (a red connection indicates a  1 , a blue one indicates a  0):
$\underline{\rm A}$   →   010,     $\underline{\rm B}$   →   11,     $\underline{\rm C}$   →   00110,
${\rm D}$   →   011,     ${\rm E}$   →   10,     ${\rm F}$   →   00111,     ${\rm G}$   →   0010,     ${\rm H}$   →   000.


(4)  Using the result of sub-task  (3) , we get:

$$L_{\rm M}= (0.40 + 0.17) \cdot 2 + (0.14 + 0.10 + 0.09) \cdot 3 + 0.05 \cdot 4 + (0.03 + 0.02) \cdot 5 =\underline{ 2.58 \,\,{\rm bit/source\hspace{0.15cm}symbol}}\hspace{0.05cm}. $$


(5)  Statements 2 and 3 are correct:

  • In the present example, Shannon-Fano results in a less favourable value than Huffman.
  • In most cases – including the example on the information page –  "Huffman"  and  "Shannon-Fano"  result in an equivalent code and thus also to the same average code word length.
  • Shannon-Fano, on the other hand, never delivers a more effective code than Huffman.