Difference between revisions of "Aufgaben:Exercise 2.6Z: Again on the Huffman Code"

Three codes to choose from

David Albert Huffman's algorithm implements entropy coding with the following properties:

The resulting binary code is prefix-free and thus easily (and immediately) decodable.

With a memoryless source, the code leads to the smallest possible mean codeword length $L_{\rm M}$.

However, $L_{\rm M}$ is never smaller than the source entropy $H$.
These two quantities can be calculated from the $M$ symbol probabilities alone.

For this exercise, we assume a memoryless source with the symbol range $M = 5$ and the alphabet

$$\{ {\rm A},\ {\rm B},\ {\rm C},\ {\rm D},\ {\rm E} \}.$$

In the above diagram, three codes are given.nbsp; You are to decide which of these codes were (or could be) created by applying the Huffman algorithm.

Hints:

The exercise belongs to the chapter Entropy Coding according to Huffman.
Further information on the Huffman algorithm can also be found in the information sheet for exercise 2.6.
To check your results, please refer to the interaction module Shannon–Fano– and Huffman–coding.

Questions

Solution

Huffman tree diagrams for subtasks (1) and (3)

(1) Solution suggestion 1 is correct.

The diagram shows the construction of the Huffman code by means of a tree diagram.
With the assignment red → 1 and blue → 0 one obtains:
$\rm A$ → 11, $\rm B$ → 10, $\rm C$ → 01, $\rm D$ → 001, $\rm E$ → 000.
The left diagram applies to the probabilities according to subtask (1).
The diagram on the right belongs to subtask (3) with slightly different probabilities.
However, it provides exactly the same code.

(2) Proposed solution 3 is correct, as the following calculation also shows:

$$L_{\rm M} \hspace{0.2cm} = \hspace{0.2cm} (0.3 + 0.3 + 0.3) \cdot 2 + (0.05 + 0.05) \cdot 3 = 2.1\,{\rm bit/source symbol}\hspace{0.05cm},$$

$$H \hspace{0.2cm} = \hspace{0.2cm} 3 \cdot 0.3 \cdot {\rm log_2}\hspace{0.15cm}(1/0.3) + 2 \cdot 0.05 \cdot {\rm log_2}\hspace{0.15cm}(1/0.05) \approx 2.0\,{\rm bit/source symbol}\hspace{0.05cm}.$$

According to the source coding theorem, $L_{\rm M} \ge H$ always holds.
However, a prerequisite for $L_{\rm M} = H$ is that all symbol probabilities can be represented in the form $2^{-k} \ (k = 1, \ 2, \ 3,\ \text{ ...})$ .
This does not apply here.

(3) $\rm A$, $\rm B$ and $\rm C$ are represented by two bits in $\text{code 1}$ , $\rm E$ and $\rm F$ by three bits. Thus one obtains for

the average codeword length

$$L_{\rm M} = p_{\rm A}\cdot 2 + p_{\rm B}\cdot 2 + p_{\rm C}\cdot 2 + p_{\rm D}\cdot 3 + p_{\rm E}\cdot 3 \hspace{0.05cm},$$

for the source entropy:

$$H = p_{\rm A}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm A}} + p_{\rm B}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm B}} + p_{\rm C}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm C}} + p_{\rm D}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm D}} + p_{\rm E}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm E}} \hspace{0.05cm}.$$

By comparing all the terms, we arrive at the result:

$$p_{\rm A}= p_{\rm B}= p_{\rm C}\hspace{0.15cm}\underline{= 0.25} \hspace{0.05cm}, \hspace{0.2cm}p_{\rm D}= p_{\rm E}\hspace{0.15cm}\underline{= 0.125}\hspace{0.3cm} \Rightarrow\hspace{0.3cm} L_{\rm M} = H = 2.25\,{\rm bit/source symbol} \hspace{0.05cm}.$$

It can be seen:

With these "more favourable" probabilities, there is even a larger mean codeword length than with the "less favourable" ones.
The equality $(L_{\rm M} = H)$ is therefore solely due to the now larger source entropy.

(4) For example, one (of many) simulations with the probabilities according to subtask (3) yields the sequence with $N = 40$ character:

$$\rm EBDCCBDABEBABCCCCCBCAABECAACCBAABBBCDCAB.$$

This results in $L_{\rm M}\hspace{0.01cm}' = ( 34 \cdot 2 + 6 \cdot 3)/50 = 2.15$ bit/source symbol, i.e. a smaller value than for the unlimited sequence $(L_{\rm M} = 2.25$ bit/source symbol$)$.
However, with a different starting value of the random generator, $(L_{\rm M}\hspace{0.03cm}' \ge L_{\rm M})$ is also possible.
This means: All statements are correct.

(5) Only solution suggestion 1:

$\text{Code 1}$ is a Huffman code, as has already been shown in the previous subtasks.
This is not true for all symbol probabilities, but at least for the parameter sets according to subtasks (1) and (3).

$\text{Code 2}$ is not a Huffman code, since such a code would always have to be prefix-free.
However, the prefix freedom is not given here, since 0 is the beginning of the code word 01 .

$\text{Code 3}$ is also not a Huffman code, since it has an average codeword length that is $p_{\rm C}$ longer than required $($see $\text{code 1})$. It is not optimal
There are no symbol probabilities $p_{\rm A}$, ... , $p_{\rm E}$, that would justify coding the symbol $\rm C$ with 010 instead of 01 .

@@ Line 77: / Line 77: @@
 ===Solution===
 {{ML-Kopf}}
-[[File:Inf_Z_2_6a_version2.png|right|frame|Huffman–Baumdiagramme zu den Teilaufgaben&nbsp; '''(1)'''&nbsp; und&nbsp; '''(3)''']]
+[[File:Inf_Z_2_6a_version2.png|right|frame|Huffman tree diagrams for subtasks&nbsp; '''(1)'''&nbsp; and&nbsp; '''(3)''']]
-'''(1)'''&nbsp; Richtig ist der <u>Lösungsvorschlag 1</u>.
+'''(1)'''&nbsp;<u>Solution suggestion 1</u> is correct.
-*Die Grafik zeigt die Konstruktion des Huffman&ndash;Codes mittels Baumdiagramm.
+*The diagram shows the construction of the Huffman code by means of a tree diagram.
-*Mit der Zuordnung rot &nbsp; &#8594; &nbsp; <b>1</b> und blau &nbsp; &#8594; &nbsp; <b>0</b> erhält man: &nbsp; <br>$\rm A$ &nbsp; &#8594; &nbsp; <b>11</b>, $\rm B$ &nbsp; &#8594; &nbsp; <b>10</b>, $\rm C$ &nbsp; &#8594; &nbsp; <b>01</b>, $\rm D$ &nbsp; &#8594; &nbsp; <b>001</b>, $\rm E$ &nbsp; &#8594; &nbsp; <b>000</b>.
+*With the assignment red &nbsp; &#8594; &nbsp; <b>1</b> and blue &nbsp; &#8594; &nbsp; <b>0</b> one obtains: &nbsp; <br>$\rm A$ &nbsp; &#8594; &nbsp; <b>11</b>, $\rm B$ &nbsp; &#8594; &nbsp; <b>10</b>, $\rm C$ &nbsp; &#8594; &nbsp; <b>01</b>, $\rm D$ &nbsp; &#8594; &nbsp; <b>001</b>, $\rm E$ &nbsp; &#8594; &nbsp; <b>000</b>.
-*Die linke Grafik gilt für die Wahrscheinlichkeiten gemäß Teilaufgabe&nbsp; '''(1)'''.&nbsp;
+*The left diagram applies to the probabilities according to subtask&nbsp; '''(1)'''.&nbsp;
-*Das rechte Diagramm gehört zur Teilaufgabe&nbsp; '''(3)'''&nbsp; mit etwas anderen Wahrscheinlichkeiten.&nbsp;
+*The diagram on the right belongs to subtask&nbsp; '''(3)'''&nbsp; with slightly different probabilities. &nbsp;
-*Es liefert aber genau den gleichen Code.
+*However, it provides exactly the same code.
 <br clear=all>
-'''(2)'''&nbsp; Richtig ist der <u>Lösungsvorschlag 3</u>, wie auch die folgende Rechnung zeigt:
+'''(2)'''&nbsp;<u>Proposed solution 3</u> is correct, as the following calculation also shows:
-:$$L_{\rm M} \hspace{0.2cm} =  \hspace{0.2cm}  (0.3 + 0.3 + 0.3) \cdot 2 + (0.05 + 0.05) \cdot 3  = 2.1\,{\rm bit/Quellensymbol}\hspace{0.05cm},$$
+:$$L_{\rm M} \hspace{0.2cm} =  \hspace{0.2cm}  (0.3 + 0.3 + 0.3) \cdot 2 + (0.05 + 0.05) \cdot 3  = 2.1\,{\rm bit/source symbol}\hspace{0.05cm},$$
 :$$H \hspace{0.2cm} =  \hspace{0.2cm}  3 \cdot 0.3 \cdot {\rm log_2}\hspace{0.15cm}(1/0.3) + 2 \cdot 0.05 \cdot {\rm log_2}\hspace{0.15cm}(1/0.05)
-\approx 2.0\,{\rm bit/Quellensymbol}\hspace{0.05cm}.$$
+\approx 2.0\,{\rm bit/source symbol}\hspace{0.05cm}.$$
-*Nach dem Quellencodierungstheorem gilt stets&nbsp; $L_{\rm M} \ge H$.
+*According to the source coding theorem, &nbsp; $L_{\rm M} \ge H$ always holds.
-*Voraussetzung für&nbsp; $L_{\rm M} = H$&nbsp; ist allerdings, dass alle Symbolwahrscheinlichkeiten in der Form&nbsp; $2^{-k} \ (k = 1, \ 2, \ 3,\ \text{ ...})$&nbsp; dargestellt werden können.
+*However, a prerequisite for&nbsp; $L_{\rm M} = H$&nbsp; is that all symbol probabilities can be represented in the form&nbsp; $2^{-k} \ (k = 1, \ 2, \ 3,\ \text{ ...})$&nbsp;.
-*Dies trifft hier nicht zu.
+*This does not apply here.
-'''(3)'''&nbsp; $\rm A$,&nbsp; $\rm B$&nbsp; und&nbsp; $\rm C$&nbsp; werden beim&nbsp; $\text{Code 1}$&nbsp; durch zwei Bit dargestellt,&nbsp; $\rm E$&nbsp; und&nbsp; $\rm F$&nbsp; durch drei Bit.&nbsp; Damit erhält man für
+'''(3)'''&nbsp; $\rm A$,&nbsp; $\rm B$&nbsp; and&nbsp; $\rm C$&nbsp; are represented by two bits in&nbsp; $\text{code 1}$&nbsp;,&nbsp; $\rm E$&nbsp; and&nbsp; $\rm F$&nbsp; by three bits.&nbsp; Thus one obtains for
-* die mittlere Codewortlänge
+* the average codeword length
 :$$L_{\rm M} =   p_{\rm A}\cdot 2 + p_{\rm B}\cdot 2 + p_{\rm C}\cdot 2 + p_{\rm D}\cdot 3 + p_{\rm E}\cdot 3
 \hspace{0.05cm},$$
-* für die Quellenentropie:
+* for the source entropy:
 :$$H =   p_{\rm A}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm A}} + p_{\rm B}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm B}} + p_{\rm C}\cdot
 {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm C}} + p_{\rm D}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm D}} + p_{\rm E}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm E}}
 \hspace{0.05cm}.$$
-Durch Vergleich aller Terme kommt man zum Ergebnis:
+By comparing all the terms, we arrive at the result:
 :$$p_{\rm A}= p_{\rm B}=  p_{\rm C}\hspace{0.15cm}\underline{= 0.25} \hspace{0.05cm}, \hspace{0.2cm}p_{\rm D}= p_{\rm E}\hspace{0.15cm}\underline{= 0.125}\hspace{0.3cm}
-\Rightarrow\hspace{0.3cm} L_{\rm M} = H = 2.25\,{\rm bit/Quellensymbol} \hspace{0.05cm}.$$
+\Rightarrow\hspace{0.3cm} L_{\rm M} = H = 2.25\,{\rm bit/source symbol} \hspace{0.05cm}.$$
-Man erkennt:
+It can be seen:
-*Mit diesen "günstigeren" Wahrscheinlichkeiten ergibt sich sogar eine größere mittlere Codewortlänge als mit den "ungünstigeren".
+*With these "more favourable" probabilities, there is even a larger mean codeword length than with the "less favourable" ones.
-*Die Gleichheit&nbsp; $(L_{\rm M} = H)$&nbsp; ist demzufolge allein auf die nun größere Quellenentropie zurückzuführen.
+*The equality&nbsp; $(L_{\rm M} = H)$&nbsp; is therefore solely due to the now larger source entropy.
-'''(4)'''&nbsp; Beispielsweise liefert eine (von vielen) Simulationen mit den Wahrscheinlichkeiten gemäß der Teilaufgabe&nbsp; '''(3)'''&nbsp; die Folge  mit&nbsp; $N = 40$&nbsp; Zeichen:
+'''(4)'''&nbsp; For example, one (of many) simulations with the probabilities according to subtask&nbsp; '''(3)'''&nbsp; yields the sequence with&nbsp; $N = 40$&nbsp; character:
 :$$\rm EBDCCBDABEBABCCCCCBCAABECAACCBAABBBCDCAB.$$
-*Es ergibt sich&nbsp; $L_{\rm M}\hspace{0.01cm}' = ( 34 \cdot 2 + 6 \cdot 3)/50  = 2.15$&nbsp; bit/Quellensymbol, also ein kleinerer Wert als für die unbegrenzte Folge&nbsp; $(L_{\rm M} = 2.25$ bit/Quellensymbol$)$.
+*This results in&nbsp; $L_{\rm M}\hspace{0.01cm}' = ( 34 \cdot 2 + 6 \cdot 3)/50  = 2.15$&nbsp; bit/source symbol,  i.e. a smaller value than for the unlimited sequence&nbsp; $(L_{\rm M} = 2.25$ bit/source symbol$)$.
-*Bei anderem Startwert des Zufallsgenerators ist aber auch&nbsp; $(L_{\rm M}\hspace{0.03cm}' \ge L_{\rm M})$&nbsp; möglich.
+*However, with a different starting value of the random generator,&nbsp; $(L_{\rm M}\hspace{0.03cm}' \ge L_{\rm M})$&nbsp; is also possible.
-*Das heißt: &nbsp; <u>Alle &nbsp;Aussagen</u> sind zutreffend.
+*This means: &nbsp; <u>All &nbsp;statements</u> are correct.
-'''(5)'''&nbsp; Richtig ist nur der <u>Lösungsvorschlag 1</u>:
+'''(5)'''&nbsp; Only <u>solution suggestion 1</u>:
-* Der&nbsp; $\text{Code 1}$&nbsp; ist ein Huffman&ndash;Code, wie schon in den vorherigen Teilaufgaben gezeigt wurde. <br>Dies gilt zwar nicht für alle Symbolwahrscheinlichkeiten, aber zumindest für die Parametersätze gemäß den Teilaufgaben&nbsp; '''(1)'''&nbsp; und&nbsp; '''(3)'''.
+*&nbsp; $\text{Code 1}$&nbsp; is a Huffman code, as has already been shown in the previous subtasks. <br>This is not true for all symbol probabilities, but at least for the parameter sets according to subtasks&nbsp; '''(1)'''&nbsp; and&nbsp; '''(3)'''.
-* Der&nbsp; $\text{Code 2}$&nbsp; ist kein Huffman&ndash;Code, da ein solcher stets präfixfrei sein müsste. <br>Die Präfixfreiheit ist hier aber nicht gegeben, da&nbsp; <b>0</b>&nbsp; der Beginn des Codewortes&nbsp; <b>01</b>&nbsp; ist.
+*&nbsp; $\text{Code 2}$&nbsp; is not a Huffman code, since such a code would always have to be prefix-free. <br>However, the prefix freedom is not given here, since&nbsp; <b>0</b>&nbsp; is the beginning of the code word&nbsp; <b>01</b>&nbsp;.
-* Der&nbsp; $\text{Code 3}$&nbsp; ist ebenfalls kein Huffman&ndash;Code, da er eine um&nbsp;  $p_{\rm C}$&nbsp;   größere mittlere Codewortlänge aufweist als erforderlich&nbsp; $($siehe $\text{Code 1})$. Er ist  nicht optimal.&nbsp; <br>Es gibt keine Symbolwahrscheinlichkeiten&nbsp; $p_{\rm A}$, ... ,&nbsp; $p_{\rm E}$, die es rechtfertigen würden, das Symbol&nbsp; $\rm C$&nbsp; mit&nbsp; <b>010</b>&nbsp; anstelle von&nbsp; <b>01</b>&nbsp; zu codieren.
+*&nbsp; $\text{Code 3}$&nbsp; is also not a Huffman code, since it has an average codeword length that is&nbsp;  $p_{\rm C}$&nbsp; longer than required&nbsp; $($see $\text{code 1})$. It is not optimal&nbsp; <br>There are no symbol probabilities&nbsp; $p_{\rm A}$, ... ,&nbsp; $p_{\rm E}$, that would justify coding the symbol&nbsp; $\rm C$&nbsp; with&nbsp; <b>010</b>&nbsp; instead of&nbsp; <b>01</b>&nbsp;.
 {{ML-Fuß}}

	$L_{\rm M}\hspace{0.01cm}' < L_{\rm M}$,
	$L_{\rm M}\hspace{0.01cm}' = L_{\rm M}$,
	$L_{\rm M}\hspace{0.01cm}' > L_{\rm M}$.

	$\text{Code 1}$,
	$\text{Code 2}$,
	$\text{Code 3}$.

	$\text{Code 1}$,
	$\text{Code 2}$,
	$\text{Code 3}$.

Difference between revisions of "Aufgaben:Exercise 2.6Z: Again on the Huffman Code"

Revision as of 23:25, 2 August 2021

Questions

Solution

Solution