Difference between revisions of "Aufgaben:Exercise 2.6: About the Huffman Coding"

From LNTwww
m (Text replacement - "[[Informationstheorie" to "[[Information_Theory")
 
(18 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Entropiecodierung nach Huffman
+
{{quiz-Header|Buchseite=Information_Theory/Entropy_Coding_According_to_Huffman
 
}}
 
}}
  
[[File:P_ID2451__Inf_A_2_6.png|right|frame|Baumdiagramm des <br>Huffman&ndash;Verfahrens]]
+
[[File:EN_Inf_A_2_6_v2.png|right|frame|Tree diagram of the Huffman method]]
Wir betrachten hier eine Quellensymbolfolge&nbsp; $\langle q_\nu  \rangle$&nbsp; mit dem Symbolumfang&nbsp; $M = 8$:
+
We consider here a source symbol sequence&nbsp; $\langle q_\nu  \rangle$&nbsp; with symbol set size&nbsp; $M = 8$:
:$$q_{\nu} = \{ \hspace{0.05cm}q_{\mu} \} = \{ \boldsymbol{\rm A} \hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm B}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm C}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm D}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm E}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm F}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm G}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm H}\hspace{0.05cm}
+
:$$q_{\nu} \in \{ \hspace{0.05cm}q_{\mu} \} = \{ \boldsymbol{\rm A} \hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm B}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm C}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm D}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm E}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm F}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm G}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm H}\hspace{0.05cm}
 
\}\hspace{0.05cm}.$$
 
\}\hspace{0.05cm}.$$
Sind die Symbole gleichwahrscheinlich, gilt also&nbsp; $p_{\rm A} =  p_{\rm B} =$ ... $ = p_{\rm H} = 1/M$, so macht Quellencodierung keinen Sinn.&nbsp; Bereits mit dem Dualcode&nbsp; $\rm A$&nbsp; &#8594;&nbsp; <b>000</b>,&nbsp; $\rm B$&nbsp; &#8594;&nbsp; <b>001</b>, ... ,&nbsp; $\rm H$&nbsp; &#8594;&nbsp; <b>111</b>, erreicht nämlich dann die mittlere Codewortlänge&nbsp; $L_{\rm M}$&nbsp; ihre untere Schranke&nbsp; $H$&nbsp; gemäß dem Quellencodierungstheorem&nbsp; $(H$ bezeichnet hierbei die <i>Quellenentropie</i>$)$:
+
If the symbols are equally probable, i.e.&nbsp; $p_{\rm A} =  p_{\rm B} =$ ... $ = p_{\rm H} = 1/M$ &nbsp; then source coding makes no sense.&nbsp; Already with the dual code&nbsp; $\rm A$&nbsp; &#8594;&nbsp; <b>000</b>,&nbsp; $\rm B$&nbsp; &#8594;&nbsp; <b>001</b>, ... ,&nbsp; $\rm H$&nbsp; &#8594;&nbsp; <b>111</b>, the average code word length&nbsp; $L_{\rm M}$&nbsp; is its lower bound&nbsp; $H$&nbsp; according to the source coding theorem&nbsp; $(H$ denotes the&nbsp; "source entropy"&nbsp; here$)$:
:$$L_{\rm M,\hspace{0.08cm}min} = H = 3 \hspace{0.15cm}{\rm bit/Quellensymbol} \hspace{0.05cm}.$$
+
:$$L_{\rm M,\hspace{0.08cm}min} = H = 3 \hspace{0.15cm}{\rm bit/source \:symbol} \hspace{0.05cm}.$$
  
Die Symbolwahrscheinlichkeiten seien aber in dieser Aufgabe wie folgt gegeben:
+
However, let the symbol probabilities be given as follows:
 
:$$p_{\rm A} \hspace{-0.05cm}= \hspace{-0.05cm} 0.04  \hspace{0.05cm},\hspace{0.1cm}p_{\rm B} \hspace{-0.05cm}= \hspace{-0.05cm} 0.08  \hspace{0.05cm},\hspace{0.1cm}p_{\rm C} \hspace{-0.05cm}= \hspace{-0.05cm} 0.14  \hspace{0.05cm},\hspace{0.1cm}
 
:$$p_{\rm A} \hspace{-0.05cm}= \hspace{-0.05cm} 0.04  \hspace{0.05cm},\hspace{0.1cm}p_{\rm B} \hspace{-0.05cm}= \hspace{-0.05cm} 0.08  \hspace{0.05cm},\hspace{0.1cm}p_{\rm C} \hspace{-0.05cm}= \hspace{-0.05cm} 0.14  \hspace{0.05cm},\hspace{0.1cm}
 
p_{\rm D} \hspace{-0.05cm}= \hspace{-0.05cm} 0.25  \hspace{0.05cm},\hspace{0.1cm} p_{\rm E} \hspace{-0.05cm}= \hspace{-0.05cm} 0.24  \hspace{0.05cm},\hspace{0.1cm}p_{\rm F} \hspace{-0.05cm}= \hspace{-0.05cm} 0.12  \hspace{0.05cm},\hspace{0.1cm}p_{\rm G} \hspace{-0.05cm}= \hspace{-0.05cm} 0.10  \hspace{0.05cm},\hspace{0.1cm}
 
p_{\rm D} \hspace{-0.05cm}= \hspace{-0.05cm} 0.25  \hspace{0.05cm},\hspace{0.1cm} p_{\rm E} \hspace{-0.05cm}= \hspace{-0.05cm} 0.24  \hspace{0.05cm},\hspace{0.1cm}p_{\rm F} \hspace{-0.05cm}= \hspace{-0.05cm} 0.12  \hspace{0.05cm},\hspace{0.1cm}p_{\rm G} \hspace{-0.05cm}= \hspace{-0.05cm} 0.10  \hspace{0.05cm},\hspace{0.1cm}
 
p_{\rm H} \hspace{-0.05cm}= \hspace{-0.05cm} 0.03  \hspace{0.05cm}.$$
 
p_{\rm H} \hspace{-0.05cm}= \hspace{-0.05cm} 0.03  \hspace{0.05cm}.$$
*Es liegt hier also eine redundante Nachrichtenquelle vor, die man durch Huffman&ndash;Codierung komprimieren kann.  
+
*So here we have a redundant  source that can be compressed by Huffman coding.
*Dieser Algorithmus wurde 1952 &ndash; also kurz nach Shannons bahnbrechenden Arbeiten zur Informationstheorie &ndash; von&nbsp; [https://de.wikipedia.org/wiki/David_A._Huffman David Albert Huffman]&nbsp; veröffentlicht und erlaubt die Konstruktion von optimalen präfixfreien Codes.
+
*This algorithm was published by&nbsp; [https://en.wikipedia.org/wiki/David_A._Huffman David Albert Huffman]&nbsp; &ndash; shortly after Shannon's groundbreaking work on information theory &ndash; and allows the construction of optimal prefix-free codes.
  
  
Der Algorithmus soll hier ohne Herleitung und Beweis angegeben werden, wobei wir uns auf Binärcodes beschränken &nbsp; &rArr; &nbsp; die Codesymbolfolge besteht nur aus Nullen und Einsen:
+
The algorithm is given here without derivation and proof, whereby we restrict ourselves to binary codes &nbsp; &rArr; &nbsp; the encoded sequence consists only of zeros and ones:
  
:'''(1)''' &nbsp; Man ordne die Symbole nach fallenden Auftrittswahrscheinlichkeiten.
+
:'''(1)''' &nbsp; Order the symbols according to decreasing probabilities of occurrence.
  
:'''(2)''' &nbsp;  Man fasse die zwei unwahrscheinlichsten Symbole zu einem neuen Symbol zusammen.
+
:'''(2)''' &nbsp;  Combine the two least likely symbols into a new symbol.
  
:'''(3)''' &nbsp;  Man wiederhole Schritt&nbsp; '''(1)'''&nbsp; und&nbsp; '''(2)''', bis nur zwei (zusammengefasste) Symbole übrig bleiben.
+
:'''(3)''' &nbsp;  Repeat steps&nbsp; '''(1)'''&nbsp; and&nbsp; '''(2)''', until only two (combined) symbols remain.
  
:'''(4)''' &nbsp;  Die wahrscheinlichere Symbolmenge wird mit&nbsp; <b>1</b>&nbsp; binär codiert, die andere Menge mit&nbsp; <b>0</b>.
+
:'''(4)''' &nbsp;  Binary encode the more probable set of symbols with&nbsp; <b>1</b>&nbsp;, the other set with&nbsp; <b>0</b>.
  
:'''(5)''' &nbsp;  Man ergänze schrittweise (von unten nach oben) die aufgespaltenen Teilcodes mit&nbsp; <b>1</b>&nbsp; bzw.&nbsp; <b>0</b>.
+
:'''(5)''' &nbsp;  Step by step (from bottom to top) add&nbsp; <b>1</b>&nbsp; or&nbsp; <b>0</b> to the split partial codes.
  
  
Oft wird dieser Algorithmus durch ein Baumdiagramm veranschaulicht. Die obige Grafik zeigt dieses für den vorliegenden Fall.  
+
This algorithm is often illustrated by a tree diagram. The diagram above shows this for the case at hand.
  
Sie haben folgende Aufgaben:
+
You have the following tasks:
  
:'''(a)''' &nbsp; Zuordnung der Symbole&nbsp; $\rm A$, ... , $\rm H$&nbsp; zu den mit&nbsp; '''[1]''', ... , '''[8]'''&nbsp; bezeichneten Eingängen.
+
:'''(a)''' &nbsp; Assign the symbols&nbsp; $\rm A$, ... , $\rm H$&nbsp; to the inputs labelled&nbsp; '''[1]''', ... , '''[8]'''&nbsp;.
  
:'''(b)''' &nbsp; Bestimmung der Summenwahrscheinlichkeiten&nbsp; $U$, ... ,&nbsp; $Z$&nbsp; sowie&nbsp; $R$&nbsp; (<i>Root</i>).
+
:'''(b)''' &nbsp; Determination of the sum probabilities&nbsp; $U$, ... ,&nbsp; $Z$&nbsp; and&nbsp; $R$&nbsp; ("root").
  
:'''(c)''' &nbsp; Zuordnung der Symbole&nbsp; $\rm A$, ... ,&nbsp; $\rm H$&nbsp; zu den entsprechenden Huffman&ndash;Binärfolgen. Eine rote Verbindung entspricht einer&nbsp; <b>1</b>, eine blaue Verbindung einer&nbsp; <b>0</b>.
+
:'''(c)''' &nbsp; Assignment of the symbols&nbsp; $\rm A$, ... ,&nbsp; $\rm H$&nbsp; to the corresponding Huffman binary sequences.&nbsp; A red connection corresponds to a&nbsp; <b>1</b>, a blue connection to a&nbsp; <b>0</b>.
  
Sie werden feststellen, dass die mittlere Codewortlänge
+
You will notice that the average code word length for Huffman coding,
:$$L_{\rm M} =  \sum_{\mu = 1}^{M}\hspace{0.05cm} p_{\mu} \cdot L_{\mu} $$
+
:$$L_{\rm M} =  \sum_{\mu = 1}^{M}\hspace{0.05cm} p_{\mu} \cdot L_{\mu}, $$
bei Huffman&ndash;Codierung nur unwesentlich größer ist als die Quellenentropie&nbsp; $H$.&nbsp; In dieser Gleichung gelten für den vorliegenden Fall folgende Werte:  
+
is only slightly larger than the source entropy&nbsp; $H$.&nbsp; In this equation, the following values apply to the present case:
 
*$M = 8$,&nbsp;  $p_1 = p_{\rm A}$,&nbsp; ... ,&nbsp; $p_8 = p_{\rm H}$.  
 
*$M = 8$,&nbsp;  $p_1 = p_{\rm A}$,&nbsp; ... ,&nbsp; $p_8 = p_{\rm H}$.  
*Die jeweilige Bitanzahl der Codesymbole für&nbsp; $\rm A$, ... ,&nbsp; $\rm H$&nbsp; ist mit&nbsp; $L_1$, ... ,&nbsp; $L_8$&nbsp; bezeichnet.
+
*The respective number of bits of the code symbols for&nbsp; $\rm A$, ... ,&nbsp; $\rm H$&nbsp; is denoted by&nbsp; $L_1$, ... ,&nbsp; $L_8$&nbsp;.
  
  
Line 53: Line 53:
  
  
''Hinweise:''
+
<u>Hints:</u>
*Die Aufgabe gehört zum  Kapitel&nbsp; [[Information_Theory/Entropiecodierung_nach_Huffman|Entropiecodierung nach Huffman]].
+
*The exercise belongs to the chapter&nbsp; [[Information_Theory/Entropiecodierung_nach_Huffman|Entropy Coding according to Huffman]].
*Insbesondere wird  Bezug genommen auf die Seiten&nbsp;
+
*In particular, reference is made to the pages&nbsp;
** [[Information_Theory/Entropiecodierung_nach_Huffman#Der_Huffman.E2.80.93Algorithmus|Der Huffman-Algorithmus]],
+
** [[Information_Theory/Entropiecodierung_nach_Huffman#The_Huffman_algorithm|The Huffman algorithm]],
**[[Information_Theory/Entropiecodierung_nach_Huffman#Darstellung_des_Huffman.E2.80.93Codes_als_Baumdiagramm|Darstellung des Huffman-Codes als Baumdiagramm]].
+
**[[Information_Theory/Entropiecodierung_nach_Huffman#Representation_of_the_Huffman_code_as_a_tree_diagram|Representation of the Huffman code as a tree diagram]].
*Zur Kontrolle Ihrer Ergebnisse verweisen wir auf das Interaktionsmodul&nbsp; [[Applets:Huffman-_und_Shannon-Fano-Codierung|Shannon&ndash;Fano&ndash; und Huffman&ndash;Codierung]].
+
*To check your results, please refer to the (German language) SWF module&nbsp; [[Applets:Huffman_Shannon_Fano|Coding according to Huffman and Shannon/Fano]].
 
   
 
   
  
  
===Fragebogen===
+
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
{Welche Eingänge im Baumdiagramm stehen für
+
{Which inputs in the tree diagram stand for
 
|type="{}"}
 
|type="{}"}
Eingangsnummer $ \ = \ $  { 7 } &nbsp; &rArr; &nbsp; Symbol $\rm A$  
+
Input number $ \ = \ $  { 7 } &nbsp; &rArr; &nbsp; symbol $\rm A$  
Eingangsnummer $ \ = \ $  { 6 } &nbsp; &rArr; &nbsp; Symbol $\rm B$
+
Input number $ \ = \ $  { 6 } &nbsp; &rArr; &nbsp; symbol $\rm B$
Eingangsnummer $ \ = \ $  { 3 } &nbsp; &rArr; &nbsp; Symbol $\rm C$
+
Input number $ \ = \ $  { 3 } &nbsp; &rArr; &nbsp; symbol $\rm C$
Eingangsnummer $ \ = \ $  { 1 } &nbsp; &rArr; &nbsp; Symbol $\rm D$
+
Input number $ \ = \ $  { 1 } &nbsp; &rArr; &nbsp; symbol $\rm D$
  
  
{Welche Zahlenwerte (Wahrscheinlichkeiten) sollten bei den Knoten im Baumdiagramm stehen?
+
{What numerical values (probabilities) should be at the nodes in the tree diagram?
 
|type="{}"}
 
|type="{}"}
Wahrscheinlichkeit $ \ = \ $ { 0.07 1% } &nbsp; bei Knoten $\rm U$
+
Probability $ \ = \ $ { 0.07 1% } &nbsp; at node $\rm U$
Wahrscheinlichkeit $ \ = \ $ { 0.15 1% } &nbsp; bei Knoten $\rm V$
+
Probability $ \ = \ $ { 0.15 1% } &nbsp; at node $\rm V$
Wahrscheinlichkeit $ \ = \ $ { 0.22 1% } &nbsp; bei Knoten $\rm W$
+
Probability $ \ = \ $ { 0.22 1% } &nbsp; at node $\rm W$
Wahrscheinlichkeit $ \ = \ $ { 0.29 1% } &nbsp; bei Knoten $\rm X$
+
Probability $ \ = \ $ { 0.29 1% } &nbsp; at node $\rm X$
Wahrscheinlichkeit $ \ = \ $ { 0.46 1% } &nbsp; bei Knoten $\rm Y$
+
Probability $ \ = \ $ { 0.46 1% } &nbsp; at node $\rm Y$
Wahrscheinlichkeit $ \ = \ $ { 0.54 1% } &nbsp; bei Knoten $\rm Z$
+
Probability $ \ = \ $ { 0.54 1% } &nbsp; at node $\rm Z$
Wahrscheinlichkeit $ \ = \ $ { 1 1% } &nbsp; bei $\rm Root$
+
Probability $ \ = \ $ { 1 1% } &nbsp; at $\rm root$
  
  
  
{Welche Binärcodes (darzustellen mit Nullen und Einsen) ergeben sich für
+
{Which binary codes (represented with zeros and ones) result for
 
|type="{}"}
 
|type="{}"}
Binärcode $ \ = \ $ { 11101 } &nbsp; &rArr; &nbsp; Symbol $\rm A$  
+
Binary code $ \ = \ $ { 11101 } &nbsp; &rArr; &nbsp; symbol $\rm A$  
Binärcode $ \ = \ $ { 1111 } &nbsp; &rArr; &nbsp; Symbol $\rm B$  
+
Binary code $ \ = \ $ { 1111 } &nbsp; &rArr; &nbsp; symbol $\rm B$  
Binärcode $ \ = \ $ { 110 } &nbsp; &rArr; &nbsp; Symbol $\rm C$  
+
Binary code $ \ = \ $ { 110 } &nbsp; &rArr; &nbsp; symbol $\rm C$  
Binärcode $ \ = \ $ { 10 } &nbsp; &rArr; &nbsp; Symbol $\rm D$  
+
Binary code $ \ = \ $ { 10 } &nbsp; &rArr; &nbsp; symbol $\rm D$  
  
  
{Wie groß ist die mittlere Codewortlänge?
+
{What is the average code word length?
 
|type="{}"}
 
|type="{}"}
$L_{\rm M} \ = \ $  { 2.73 3% } $\ \rm bit/Quellensymbol$
+
$L_{\rm M} \ = \ $  { 2.73 3% } $\ \rm bit/source \hspace{0.15cm} symbol$
  
  
{Wie groß ist die Quellenentropie&nbsp; $H$?  
+
{What is the source entropy&nbsp; $H$?  
 
|type="()"}
 
|type="()"}
+ $H = 2.71\ \rm  bit/Quellensymbol.$
+
+ $H = 2.71\ \rm  bit/source \hspace{0.15cm}symbol.$
- $H = 2.75\ \rm  bit/Quellensymbol.$
+
- $H = 2.75\ \rm  bit/source \hspace{0.15cm}symbol.$
- $H = 3.00\ \rm  bit/Quellensymbol.$
+
- $H = 3.00\ \rm  bit/source \hspace{0.15cm}symbol.$
  
  
Line 108: Line 108:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
'''(1)'''&nbsp; Vor dem Huffman&ndash;Algorithmus müssen die Symbole nach ihren Auftrittswahrscheinlichkeiten sortiert werden.  
+
'''(1)'''&nbsp; Before the Huffman algorithm, the symbols must be sorted according to their probabilities of occurrence.
*Da die zwei unwahrscheinlichsten Symbole schon im ersten Schritt zusammengefasst werden, nehmen die Auftrittswahrscheinlichkeiten von oben nach unten ab <br>(in der Grafik zu dieser Musterlösung von links nach rechts).  
+
*Since the two least likely symbols are already combined in the first step, the probabilities of occurrence decrease from top to bottom <br>(from left to right in the diagram for this sample solution).  
*Durch Vergleich mit dem Angabenblatt erhält man:
+
*By comparison with the data sheet, one obtains:
  
:Symbol $\rm A$:&nbsp;  <u>Eingang 7</u>,  &nbsp; &nbsp; Symbol $\rm B$:&nbsp;  <u>Eingang 6</u>, &nbsp; &nbsp; Symbol $\rm C$:&nbsp;  <u>Eingang 3</u>, &nbsp; &nbsp; Symbol $\rm D$:&nbsp;  <u>Eingang 1</u>.
+
:Symbol $\rm A$:&nbsp;  <u>Input 7</u>,  &nbsp; &nbsp; Symbol $\rm B$:&nbsp;  <u>Input 6</u>, &nbsp; &nbsp; Symbol $\rm C$:&nbsp;  <u>Input 3</u>, &nbsp; &nbsp; Symbol $\rm D$:&nbsp;  <u>Input 1</u>.
  
[[File:P_ID2452__Inf_A_2_6a.png|right|right|frame|An die Aufgabe angepasstes Baumdiagramm]]
+
[[File:EN_Inf_Z_2_6b.png|right|right|frame|Tree diagram adapted to the exercise]]
  
  
 +
'''(2)'''&nbsp; The node&nbsp; $\rm R$&nbsp; is the tree&nbsp; ("root").&nbsp; This is always assigned&nbsp; $\underline{R=1}$,&nbsp; regardless of the occurrence probabilities.
  
 +
The following applies to the other values:
  
'''(2)'''&nbsp; Der Knoten&nbsp; $\rm R$&nbsp; ist die Baumwurzel&nbsp; (<i>Root</i>).&nbsp; Dieser ist stets mit&nbsp; $\underline{R=1}$&nbsp; belegt, unabhängig von den Auftrittswahrscheinlichkeiten.
+
:Step 1:&nbsp;&nbsp;&nbsp; $p_{\rm U} = p_{\rm A} + p_{\rm H} = 0.04 + 0.03 \hspace{0.15cm}\underline{ =0.07}$,
  
Für die weiteren Werte gilt:
+
:Step 2:&nbsp;&nbsp;&nbsp; $p_{\rm V} = p_{\rm U} + p_{\rm B} = 0.07 + 0.08 \hspace{0.15cm}\underline{ =0.15}$,
  
:Schritt 1:&nbsp;&nbsp;&nbsp; $p_{\rm U} = p_{\rm A} + p_{\rm H} = 0.04 + 0.03 \hspace{0.15cm}\underline{ =0.07}$,
+
:Step 3:&nbsp;&nbsp;&nbsp; $p_{\rm W} = p_{\rm F} + p_{\rm G} = 0.12 + 0.10 \hspace{0.15cm}\underline{ =0.22}$,
  
:Schritt 2:&nbsp;&nbsp;&nbsp; $p_{\rm V} = p_{\rm U} + p_{\rm B} = 0.07 + 0.08 \hspace{0.15cm}\underline{ =0.15}$,
+
:Step 4:&nbsp;&nbsp;&nbsp; $p_{\rm X} = p_{\rm V} + p_{\rm C} = 0.15 + 0.14  \hspace{0.15cm}\underline{ =0.29}$,
  
:Schritt 3:&nbsp;&nbsp;&nbsp; $p_{\rm W} = p_{\rm F} + p_{\rm G} = 0.12 + 0.10 \hspace{0.15cm}\underline{ =0.22}$,
+
:Step 5:&nbsp;&nbsp;&nbsp; $p_{\rm Y} = p_{\rm W} + p_{\rm E} = 0.22 + 0.24  \hspace{0.15cm}\underline{ =0.46}$,
  
:Schritt 4:&nbsp;&nbsp;&nbsp; $p_{\rm X} = p_{\rm V} + p_{\rm C} = 0.15 + 0.14  \hspace{0.15cm}\underline{ =0.29}$,
+
:Step 6:&nbsp;&nbsp;&nbsp; $p_{\rm Z} = p_{\rm X} + p_{\rm D} = 0.29 + 0.25  \hspace{0.15cm}\underline{ =0.54}$.
 
 
:Schritt 5:&nbsp;&nbsp;&nbsp; $p_{\rm Y} = p_{\rm W} + p_{\rm E} = 0.22 + 0.24  \hspace{0.15cm}\underline{ =0.46}$,
 
 
 
:Schritt 6:&nbsp;&nbsp;&nbsp; $p_{\rm Z} = p_{\rm X} + p_{\rm D} = 0.29 + 0.25  \hspace{0.15cm}\underline{ =0.54}$.
 
 
<br clear=all>
 
<br clear=all>
'''(3)'''&nbsp; Den Huffman&ndash;Code für das Symbol&nbsp; $\rm A$&nbsp; erhält man, wenn man den Weg von der&nbsp; $\rm Root$&nbsp; (gelber Punkt) zum Symbol&nbsp; $\rm A$&nbsp; zurückverfolgt und jeder roten Verbindungslinie eine&nbsp; <b>1</b>&nbsp; zuordnet, jeder blauen eine&nbsp; <b>0</b>.
+
'''(3)'''&nbsp; The Huffman code for symbol&nbsp; $\rm A$&nbsp; is obtained by tracing the path from&nbsp; $\rm root$&nbsp; (yellow dot) to symbol &nbsp; $\rm A$&nbsp; and assigning a&nbsp; <b>1</b>&nbsp; to each red line and a&nbsp; <b>0</b> to each blue one.
  
* Symbol $\rm A$:&nbsp;&nbsp;&nbsp; rot&ndash;rot&ndash;rot&ndash;blau&ndash;rot &#8594; <b><u>11101</u></b>,
+
* Symbol $\rm A$:&nbsp;&nbsp;&nbsp; red&ndash;red&ndash;red&ndash;blue&ndash;red&#8594; <b><u>11101</u></b>,
  
* Symbol $\rm B$:&nbsp;&nbsp;&nbsp; rot&ndash;rot&ndash;rot&ndash;rot &#8594; <b><u>1111</u></b>,
+
* Symbol $\rm B$:&nbsp;&nbsp;&nbsp; red&ndash;red&ndash;red&ndash;red &#8594; <b><u>1111</u></b>,
  
* Symbol $\rm C$:&nbsp;&nbsp;&nbsp; rot&ndash;rot&ndash;blau &#8594; <b><u>110</u></b>,
+
* Symbol $\rm C$:&nbsp;&nbsp;&nbsp; red&ndash;red&ndash;blue &#8594; <b><u>110</u></b>,
  
* Symbol $\rm D$:&nbsp;&nbsp;&nbsp; rot&ndash;blau&ndash; &#8594; <b><u>10</u></b>,
+
* Symbol $\rm D$:&nbsp;&nbsp;&nbsp; red&ndash;blue&ndash; &#8594; <b><u>10</u></b>,
  
* Symbol $\rm E$:&nbsp;&nbsp;&nbsp; blau&ndash;rot &#8594; <b>01</b>,
+
* Symbol $\rm E$:&nbsp;&nbsp;&nbsp; blue&ndash;red &#8594; <b>01</b>,
  
* Symbol $\rm F$:&nbsp;&nbsp;&nbsp; blau&ndash;blau&ndash;rot &#8594; <b>001</b>,
+
* Symbol $\rm F$:&nbsp;&nbsp;&nbsp; blue&ndash;blue&ndash;red &#8594; <b>001</b>,
  
* Symbol $\rm G$:&nbsp;&nbsp;&nbsp; blau&ndash;blau&ndash;blau &#8594; <b>000</b>,
+
* Symbol $\rm G$:&nbsp;&nbsp;&nbsp; blue&ndash;blue&ndash;blue &#8594; <b>000</b>,
  
* Symbol $\rm H$:&nbsp;&nbsp;&nbsp; rot&ndash;rot&ndash;rot&ndash;blau&ndash;blau &#8594; <b>11100</b>.
+
* Symbol $\rm H$:&nbsp;&nbsp;&nbsp; red&ndash;red&ndash;red&ndash;blue&ndash;blue &#8594; <b>11100</b>.
  
  
'''(4)'''&nbsp; Die Codierung unter Punkt&nbsp; '''(3)'''&nbsp; hat ergeben, dass
+
'''(4)'''&nbsp; The coding under point&nbsp; '''(3)'''&nbsp; has shown that
  
* die Symbole&nbsp; $\rm D$&nbsp; und&nbsp; $\rm E$&nbsp; mit zwei Bit,
+
* the symbols&nbsp; $\rm D$&nbsp; and&nbsp; $\rm E$&nbsp; are each represented with two bits,
  
* die Symbole&nbsp; $\rm C$,&nbsp; $\rm F$&nbsp; und&nbsp; $\rm G$&nbsp; mit drei Bit,
+
* the symbols&nbsp; $\rm C$,&nbsp; $\rm F$&nbsp; and&nbsp; $\rm G$&nbsp; with three bits,
  
* das Symbol&nbsp; $\rm B$&nbsp;  mit vier Bit,&nbsp; und
+
* the symbols&nbsp; $\rm B$&nbsp;  with four bits,&nbsp; and
  
* die Symbole&nbsp; $\rm A$&nbsp; und&nbsp; $\rm H$&nbsp; jeweils mit fünf Bit
+
* the symbols&nbsp; $\rm A$&nbsp; and&nbsp; $\rm H$&nbsp; with five bits.
  
  
dargestellt werden.&nbsp; Damit erhält man für die mittlere Codewortlänge (in &bdquo;bit/Quellensymbol&dquo;):
+
Thus, for the average code word length (in "bit/source symbol"):
 
:$$L_{\rm M} \hspace{0.2cm} =  \hspace{0.2cm}  (p_{\rm D} + p_{\rm E}) \cdot 2 + (p_{\rm C} + p_{\rm F} + p_{\rm G}) \cdot 3  + p_{\rm B} \cdot 4 +(p_{\rm A} + p_{\rm H}) \cdot 5 =  0.49 \cdot 2 + 0.36 \cdot 3 +0.08 \cdot 4 +0.07 \cdot 5  
 
:$$L_{\rm M} \hspace{0.2cm} =  \hspace{0.2cm}  (p_{\rm D} + p_{\rm E}) \cdot 2 + (p_{\rm C} + p_{\rm F} + p_{\rm G}) \cdot 3  + p_{\rm B} \cdot 4 +(p_{\rm A} + p_{\rm H}) \cdot 5 =  0.49 \cdot 2 + 0.36 \cdot 3 +0.08 \cdot 4 +0.07 \cdot 5  
 
\hspace{0.15cm}\underline{= 2.73}\hspace{0.05cm}.$$
 
\hspace{0.15cm}\underline{= 2.73}\hspace{0.05cm}.$$
Line 173: Line 171:
  
  
'''(5)'''&nbsp; Richtig ist allein die <u>Antwort 1</u>:
+
'''(5)'''&nbsp; Only <u>answer 1</u> is correct:
*Die mittlere Codewortlänge&nbsp; $L_{\rm M}$&nbsp; kann nicht kleiner sein als die Quellenentropie&nbsp; $H$.&nbsp; Damit scheiden die Antworten 2 und 3 aus.
+
*The average code word length&nbsp; $L_{\rm M}$&nbsp; cannot be smaller than the source entropy&nbsp; $H$.&nbsp; This eliminates answers 2 and 3.
*Aus den vorgegebenen Auftrittswahrscheinlichkeiten kann  die Quellenentropie tatsächlich  zu&nbsp; $H = 2.71$ bit/Quellensymbol berechnet werden.  
+
*From the given probabilities of occurrence, the source entropy can actually be calculated to&nbsp; $H = 2.71$ bit/source symbol.
*Man erkennt, dass diese Huffman&ndash;Codierung die vorgegebene Grenze&nbsp; (Quellencodierungstheorem)&nbsp; $L_{\rm M, \ min } \ge H = 2.71$&nbsp; bit/Quellensymbol nahezu erreicht.
+
*It can be seen that this Huffman coding almost reaches the given limit&nbsp; ("Source Coding Theorem")&nbsp; $L_{\rm M, \ min } \ge H = 2.71$&nbsp; bit/source symbol.
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  
  
  
[[Category:Aufgaben zu Informationstheorie|^2.3 Entropiecodierung nach Huffman^]]
+
[[Category:Information Theory: Exercises|^2.3 Entropy Coding according to Huffman^]]

Latest revision as of 15:54, 1 November 2022

Tree diagram of the Huffman method

We consider here a source symbol sequence  $\langle q_\nu \rangle$  with symbol set size  $M = 8$:

$$q_{\nu} \in \{ \hspace{0.05cm}q_{\mu} \} = \{ \boldsymbol{\rm A} \hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm B}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm C}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm D}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm E}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm F}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm G}\hspace{0.05cm},\hspace{0.05cm} \boldsymbol{\rm H}\hspace{0.05cm} \}\hspace{0.05cm}.$$

If the symbols are equally probable, i.e.  $p_{\rm A} = p_{\rm B} =$ ... $ = p_{\rm H} = 1/M$   then source coding makes no sense.  Already with the dual code  $\rm A$  →  000,  $\rm B$  →  001, ... ,  $\rm H$  →  111, the average code word length  $L_{\rm M}$  is its lower bound  $H$  according to the source coding theorem  $(H$ denotes the  "source entropy"  here$)$:

$$L_{\rm M,\hspace{0.08cm}min} = H = 3 \hspace{0.15cm}{\rm bit/source \:symbol} \hspace{0.05cm}.$$

However, let the symbol probabilities be given as follows:

$$p_{\rm A} \hspace{-0.05cm}= \hspace{-0.05cm} 0.04 \hspace{0.05cm},\hspace{0.1cm}p_{\rm B} \hspace{-0.05cm}= \hspace{-0.05cm} 0.08 \hspace{0.05cm},\hspace{0.1cm}p_{\rm C} \hspace{-0.05cm}= \hspace{-0.05cm} 0.14 \hspace{0.05cm},\hspace{0.1cm} p_{\rm D} \hspace{-0.05cm}= \hspace{-0.05cm} 0.25 \hspace{0.05cm},\hspace{0.1cm} p_{\rm E} \hspace{-0.05cm}= \hspace{-0.05cm} 0.24 \hspace{0.05cm},\hspace{0.1cm}p_{\rm F} \hspace{-0.05cm}= \hspace{-0.05cm} 0.12 \hspace{0.05cm},\hspace{0.1cm}p_{\rm G} \hspace{-0.05cm}= \hspace{-0.05cm} 0.10 \hspace{0.05cm},\hspace{0.1cm} p_{\rm H} \hspace{-0.05cm}= \hspace{-0.05cm} 0.03 \hspace{0.05cm}.$$
  • So here we have a redundant source that can be compressed by Huffman coding.
  • This algorithm was published by  David Albert Huffman  – shortly after Shannon's groundbreaking work on information theory – and allows the construction of optimal prefix-free codes.


The algorithm is given here without derivation and proof, whereby we restrict ourselves to binary codes   ⇒   the encoded sequence consists only of zeros and ones:

(1)   Order the symbols according to decreasing probabilities of occurrence.
(2)   Combine the two least likely symbols into a new symbol.
(3)   Repeat steps  (1)  and  (2), until only two (combined) symbols remain.
(4)   Binary encode the more probable set of symbols with  1 , the other set with  0.
(5)   Step by step (from bottom to top) add  1  or  0 to the split partial codes.


This algorithm is often illustrated by a tree diagram. The diagram above shows this for the case at hand.

You have the following tasks:

(a)   Assign the symbols  $\rm A$, ... , $\rm H$  to the inputs labelled  [1], ... , [8] .
(b)   Determination of the sum probabilities  $U$, ... ,  $Z$  and  $R$  ("root").
(c)   Assignment of the symbols  $\rm A$, ... ,  $\rm H$  to the corresponding Huffman binary sequences.  A red connection corresponds to a  1, a blue connection to a  0.

You will notice that the average code word length for Huffman coding,

$$L_{\rm M} = \sum_{\mu = 1}^{M}\hspace{0.05cm} p_{\mu} \cdot L_{\mu}, $$

is only slightly larger than the source entropy  $H$.  In this equation, the following values apply to the present case:

  • $M = 8$,  $p_1 = p_{\rm A}$,  ... ,  $p_8 = p_{\rm H}$.
  • The respective number of bits of the code symbols for  $\rm A$, ... ,  $\rm H$  is denoted by  $L_1$, ... ,  $L_8$ .




Hints:


Questions

1

Which inputs in the tree diagram stand for

Input number $ \ = \ $

  ⇒   symbol $\rm A$
Input number $ \ = \ $

  ⇒   symbol $\rm B$
Input number $ \ = \ $

  ⇒   symbol $\rm C$
Input number $ \ = \ $

  ⇒   symbol $\rm D$

2

What numerical values (probabilities) should be at the nodes in the tree diagram?

Probability $ \ = \ $

  at node $\rm U$
Probability $ \ = \ $

  at node $\rm V$
Probability $ \ = \ $

  at node $\rm W$
Probability $ \ = \ $

  at node $\rm X$
Probability $ \ = \ $

  at node $\rm Y$
Probability $ \ = \ $

  at node $\rm Z$
Probability $ \ = \ $

  at $\rm root$

3

Which binary codes (represented with zeros and ones) result for

Binary code $ \ = \ $

  ⇒   symbol $\rm A$
Binary code $ \ = \ $

  ⇒   symbol $\rm B$
Binary code $ \ = \ $

  ⇒   symbol $\rm C$
Binary code $ \ = \ $

  ⇒   symbol $\rm D$

4

What is the average code word length?

$L_{\rm M} \ = \ $

$\ \rm bit/source \hspace{0.15cm} symbol$

5

What is the source entropy  $H$?

$H = 2.71\ \rm bit/source \hspace{0.15cm}symbol.$
$H = 2.75\ \rm bit/source \hspace{0.15cm}symbol.$
$H = 3.00\ \rm bit/source \hspace{0.15cm}symbol.$


Solution

(1)  Before the Huffman algorithm, the symbols must be sorted according to their probabilities of occurrence.

  • Since the two least likely symbols are already combined in the first step, the probabilities of occurrence decrease from top to bottom
    (from left to right in the diagram for this sample solution).
  • By comparison with the data sheet, one obtains:
Symbol $\rm A$:  Input 7,     Symbol $\rm B$:  Input 6,     Symbol $\rm C$:  Input 3,     Symbol $\rm D$:  Input 1.
Tree diagram adapted to the exercise


(2)  The node  $\rm R$  is the tree  ("root").  This is always assigned  $\underline{R=1}$,  regardless of the occurrence probabilities.

The following applies to the other values:

Step 1:    $p_{\rm U} = p_{\rm A} + p_{\rm H} = 0.04 + 0.03 \hspace{0.15cm}\underline{ =0.07}$,
Step 2:    $p_{\rm V} = p_{\rm U} + p_{\rm B} = 0.07 + 0.08 \hspace{0.15cm}\underline{ =0.15}$,
Step 3:    $p_{\rm W} = p_{\rm F} + p_{\rm G} = 0.12 + 0.10 \hspace{0.15cm}\underline{ =0.22}$,
Step 4:    $p_{\rm X} = p_{\rm V} + p_{\rm C} = 0.15 + 0.14 \hspace{0.15cm}\underline{ =0.29}$,
Step 5:    $p_{\rm Y} = p_{\rm W} + p_{\rm E} = 0.22 + 0.24 \hspace{0.15cm}\underline{ =0.46}$,
Step 6:    $p_{\rm Z} = p_{\rm X} + p_{\rm D} = 0.29 + 0.25 \hspace{0.15cm}\underline{ =0.54}$.


(3)  The Huffman code for symbol  $\rm A$  is obtained by tracing the path from  $\rm root$  (yellow dot) to symbol   $\rm A$  and assigning a  1  to each red line and a  0 to each blue one.

  • Symbol $\rm A$:    red–red–red–blue–red→ 11101,
  • Symbol $\rm B$:    red–red–red–red → 1111,
  • Symbol $\rm C$:    red–red–blue → 110,
  • Symbol $\rm D$:    red–blue– → 10,
  • Symbol $\rm E$:    blue–red → 01,
  • Symbol $\rm F$:    blue–blue–red → 001,
  • Symbol $\rm G$:    blue–blue–blue → 000,
  • Symbol $\rm H$:    red–red–red–blue–blue → 11100.


(4)  The coding under point  (3)  has shown that

  • the symbols  $\rm D$  and  $\rm E$  are each represented with two bits,
  • the symbols  $\rm C$,  $\rm F$  and  $\rm G$  with three bits,
  • the symbols  $\rm B$  with four bits,  and
  • the symbols  $\rm A$  and  $\rm H$  with five bits.


Thus, for the average code word length (in "bit/source symbol"):

$$L_{\rm M} \hspace{0.2cm} = \hspace{0.2cm} (p_{\rm D} + p_{\rm E}) \cdot 2 + (p_{\rm C} + p_{\rm F} + p_{\rm G}) \cdot 3 + p_{\rm B} \cdot 4 +(p_{\rm A} + p_{\rm H}) \cdot 5 = 0.49 \cdot 2 + 0.36 \cdot 3 +0.08 \cdot 4 +0.07 \cdot 5 \hspace{0.15cm}\underline{= 2.73}\hspace{0.05cm}.$$


(5)  Only answer 1 is correct:

  • The average code word length  $L_{\rm M}$  cannot be smaller than the source entropy  $H$.  This eliminates answers 2 and 3.
  • From the given probabilities of occurrence, the source entropy can actually be calculated to  $H = 2.71$ bit/source symbol.
  • It can be seen that this Huffman coding almost reaches the given limit  ("Source Coding Theorem")  $L_{\rm M, \ min } \ge H = 2.71$  bit/source symbol.