Difference between revisions of "Aufgaben:Exercise 2.6Z: Again on the Huffman Code"

From LNTwww
 
(33 intermediate revisions by 6 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie und Quellencodierung/Entropiecodierung nach Huffman
+
{{quiz-Header|Buchseite=Information_Theory/Entropy_Coding_According_to_Huffman
 
}}
 
}}
  
[[File:P_ID2453__Inf_Z_2_6.png|right|]]
+
[[File:P_ID2453__Inf_Z_2_6.png|right|frame|Three codes to choose from ]]
Der Algorithmus von David A. Huffman realisiert eine Entropiecodierung mit folgenden Eigenschaften:
+
  [https://en.wikipedia.org/wiki/David_A._Huffman David Albert Huffman's]  algorithm implements entropy coding with the following properties:
  
:* Der entstehende Binärcode ist präfixfrei und somit in einfacher Weise (und sofort) decodierbar.
+
* The resulting binary code is prefix-free and thus easily (and immediately) decodable.
 +
* With a memoryless source, the code leads to the smallest possible average code word length  $L_{\rm M}$.
 +
* However, $L_{\rm M}$  is never smaller than the source entropy  $H$.
 +
* These two quantities can be calculated from the  $M$  symbol probabilities alone.
  
:* Der Code führt bei einer gedächtnislosen Quelle zur kleinstmöglichen  mittleren Codewortlänge <i>L</i><sub>M</sub>.
 
  
:* <i>L</i><sub>M</sub> ist aber nie kleiner als die Quellenentropie <i>H</i>. Diese beiden Größen sind allein aus den <i>M</i> Symbolwahrscheinlichkeiten berechenbar.
+
For this exercise, we assume a memoryless source with the symbol set size&nbsp; $M = 5$&nbsp; and the alphabet
 +
:$$\{ {\rm A},\ {\rm B},\ {\rm C},\ {\rm D},\ {\rm E} \}.$$
  
Vorausgesetzt wird für diese Aufgabe eine gedächtnislose Quelle mit dem Symbolumfang <i>M</i> = 5 und dem Alphabet {<b>A</b>, <b>B</b>, <b>C</b>, <b>D</b>, <b>E</b>}. In obiger Grafik sind drei Codes vorgegeben. Sie sollen entscheiden, welche dieser Codes durch Anwendung des Huffman&ndash;Algorithmus entstanden sind (oder sein könnten).
+
In the above diagram, three binary codes are given.&nbsp; You are to decide which of these codes were (or could be) created by applying the Huffman algorithm.
  
<b>Hinweis:</b> Die Aufgabe gehört zum Kapitel 2.3. Weitere Informationen zum Huffman&ndash;Algorithmus finden Sie auch im Angabenblatt zur Aufgabe A2.6. Zur Kontrolle Ihrer Ergebnisse verweisen wir auf das Interaktionsmodul Shannon&ndash;Fano&ndash; und Huffman&ndash;Codierung.
 
  
  
===Fragebogen===
+
 
 +
 
 +
 
 +
 
 +
<u>Hints:</u>
 +
*The exercise belongs to the chapter&nbsp; [[Information_Theory/Entropiecodierung_nach_Huffman|Entropy Coding according to Huffman]].
 +
*Further information on the Huffman algorithm can also be found in the information sheet for&nbsp; [[Aufgaben:Exercise_2.6:_About_the_Huffman_Coding|Exercise 2.6]].
 +
*To check your results, please refer to the (German language) SWF module&nbsp; [[Applets:Huffman_Shannon_Fano|Coding according to Huffman and Shannon/Fano]].
 +
 +
 
 +
 
 +
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
{Welche Codes liefert Huffman für <i>p</i><sub>A</sub> = <i>p</i><sub>B</sub> = <i>p</i><sub>C</sub> = 0.3, <i>p</i><sub>D</sub> = <i>p</i><sub>E</sub> = 0.05?
+
{Which codes could have arisen according to Huffman for&nbsp; $p_{\rm A} = p_{\rm B} = p_{\rm C= 0.3$&nbsp; and&nbsp; $p_{\rm D} = p_{\rm E= 0.05$?
 
|type="[]"}
 
|type="[]"}
+ Code 1,
+
+ $\text{Code 1}$,
- Code 2,
+
- $\text{Code 2}$,
- Code 3.
+
- $\text{Code 3}$.
  
  
{Wie stehen mittlere Codewortlänge <i>L</i><sub>M</Sub> und Entropie <i>H</i> in Relation?
+
{How are the average code word length&nbsp; $L_{\rm M}$&nbsp; and the entropy&nbsp; $H$&nbsp; related for the given probabilities?
|type="[]"}
+
|type="()"}
- <i>L</i><sub>M</Sub> < <i>H</i>,
+
- $L_{\rm M} < H$,
- <i>L</i><sub>M</Sub> = <i>H</i>,
+
- $L_{\rm M} \ge H$,
+ <i>L</i><sub>M</Sub> > <i>H</i>.
+
+ $L_{\rm M} > H$.
  
  
{Mit welchen Symbolwahrscheinlichkeiten würde hier <i>L</i><sub>M</Sub> = <i>H</i> gelten?
+
{Consider&nbsp; $\text{Code 1}$.&nbsp; With what symbol probabilities would&nbsp; $L_{\rm M} = H$&nbsp; hold?
 
|type="{}"}
 
|type="{}"}
$p_A$ = { 0.25 3% }
+
$\ p_{\rm A} \ = \ $ { 0.25 3% }
$p_B$ = { 0.25 3% }
+
$\ p_{\rm B} \ = \ $ { 0.25 3% }
$p_C$ = { 0.25 3% }
+
$\ p_{\rm C} \ = \ $ { 0.25 3% }
$p_D$ = { 0.125 3% }
+
$\ p_{\rm D} \ = \ $ { 0.125 3% }
$p_E$ = { 0.125 3% }
+
$\ p_{\rm E} \ = \ $ { 0.125 3% }
  
  
{Die Angaben zu (3) gelten weiter. Die mittlere Codewortlänge wird aber nun für eine Folge der Länge <i>N</i> = 40 ermittelt &nbsp;&#8658;&nbsp; <i>L</i><sub>M</sub>&prime;. Was ist möglich?
+
{The probabilities calculated in subtask&nbsp; '''(3)'''&nbsp; still apply. <br>However, the average code word length is now determined for a sequence of length&nbsp; $N = 40$&nbsp; &nbsp;&#8658;&nbsp; $L_{\rm M}\hspace{0.03cm}'$. What is possible?
 
|type="[]"}
 
|type="[]"}
+ <i>L</i><sub>M</sub>&prime; < <i>L</i><sub>M</sub>,
+
+ $L_{\rm M}\hspace{0.01cm}' < L_{\rm M}$,
+ <i>L</i><sub>M</sub>&prime; = <i>L</i><sub>M</sub>,
+
+ $L_{\rm M}\hspace{0.01cm}' = L_{\rm M}$,
+ <i>L</i><sub>M</sub>&prime; > <i>L</i><sub>M</sub>.
+
+ $L_{\rm M}\hspace{0.01cm}' > L_{\rm M}$.
  
  
{Welcher Code könnte überhaupt ein Huffman&ndash;Code sein?
+
{Which code could possibly be a Huffman code?
 
|type="[]"}
 
|type="[]"}
+ Code 1,
+
+ $\text{Code 1}$,
- Code 2,
+
- $\text{Code 2}$,
- Code 3.
+
- $\text{Code 3}$.
  
  
Line 60: Line 73:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
<b>1.</b>&nbsp;&nbsp;Die Grafik zeigt die Konstruktion des Huffman&ndash;Codes mittels Baumdiagramm. Mit der Zuordnung rot &#8594; <b>1</b> und blau &#8594; <b>0</b> kommt man zu folgendem Code <b>A</b> &#8594; <b>11</b>, <b>B</b> &#8594; <b>10</b>, <b>C</b> &#8594; <b>01</b>, <b>D</b> &#8594; <b>001</b>, <b>E</b> &#8594; <b>000</b>. Richtig ist der <u>Lösungsvorschlag 1</u>.
+
[[File:EN_Inf_Z_2_6a_v2.png|right|frame|Huffman tree diagrams for subtasks&nbsp; '''(1)'''&nbsp; and&nbsp; '''(3)''']]
[[File:P_ID2454__Inf_Z_2_6a.png|center|]]
+
'''(1)'''&nbsp;<u>Solution suggestion 1</u> is correct.
Die linke Grafik gilt für die Wahrscheinlichkeiten gemäß Teilaufgabe (a). Das rechte Diagramm gehört zur Teilaufgabe (3) mit etwas anderen Wahrscheinlichkeiten. Es liefert den genau gleichen Code.
+
*The diagram shows the construction of the Huffman code by means of a tree diagram.
 +
*With the assignment red &nbsp; &#8594; &nbsp; <b>1</b> and blue &nbsp; &#8594; &nbsp; <b>0</b> one obtains: &nbsp; <br>$\rm A$ &nbsp; &#8594; &nbsp; <b>11</b>, $\rm B$ &nbsp; &#8594; &nbsp; <b>10</b>, $\rm C$ &nbsp; &#8594; &nbsp; <b>01</b>, $\rm D$ &nbsp; &#8594; &nbsp; <b>001</b>, $\rm E$ &nbsp; &#8594; &nbsp; <b>000</b>.  
 +
*The left diagram applies to the probabilities according to subtask&nbsp; '''(1)'''.&nbsp;
 +
*The diagram on the right belongs to subtask&nbsp; '''(3)'''&nbsp; with slightly different probabilities. &nbsp;
 +
*However, it provides exactly the same code.
 +
<br clear=all>
 +
'''(2)'''&nbsp;<u>Proposed solution 3</u> is correct, as the following calculation also shows:
 +
:$$L_{\rm M} \hspace{0.2cm} =  \hspace{0.2cm}  (0.3 + 0.3 + 0.3) \cdot 2 + (0.05 + 0.05) \cdot 3  = 2.1\,{\rm bit/source \:symbol}\hspace{0.05cm},$$
 +
:$$H \hspace{0.2cm} =  \hspace{0.2cm}  3 \cdot 0.3 \cdot {\rm log_2}\hspace{0.15cm}(1/0.3) + 2 \cdot 0.05 \cdot {\rm log_2}\hspace{0.15cm}(1/0.05)
 +
\approx 2.0\,{\rm bit/source \:symbol}\hspace{0.05cm}.$$
  
<b>b)</b>&nbsp;&nbsp;Nach dem Quellencodierungstheorem gilt stets <i>L</i><sub>M</sub> &#8805; <i>H</i>. Voraussetzung für <i>L</i><sub>M</sub> = <i>H</i> ist allerdings, dass alle Symbolwahrscheinlichkeiten in der Form 2<sup>&ndash;<i>k</i></sup> (<i>k</i> = 1, 2, 3, ...) dargestellt werden können. Richtig ist demnach <u>Lösungsvorschlag 3</u>, wie auch die folgende Rechnung (mit &bdquo;log<sub>2</sub>&rdquo; &nbsp;&#8658;&nbsp; &bdquo;ld&rdquo;) zeigt:
+
*According to the source coding theorem, &nbsp; $L_{\rm M} \ge H$ always holds.
:$$L_{\rm M} \hspace{0.2cm} = \hspace{0.2cm} (0.3 + 0.3 + 0.3) \cdot 2 + (0.05 + 0.05) \cdot 3  = 2.1\,{\rm bit/Quellensymbol}\hspace{0.05cm},\\
+
*However, a prerequisite for&nbsp; $L_{\rm M} = H$&nbsp; is that all symbol probabilities can be represented in the form&nbsp; $2^{-k} \ (k = 1, \ 2, \ 3,\ \text{ ...})$&nbsp;.
H \hspace{0.2cm} =  \hspace{0.2cm}  3 \cdot 0.3 \cdot {\rm ld}\hspace{0.15cm}(1/0.3) + 2 \cdot 0.05 \cdot {\rm ld}\hspace{0.15cm}(1/0.05)
+
*This does not apply here.
\approx 2.0\,{\rm bit/Quellensymbol}\hspace{0.05cm}.$$
 
  
<b>3.</b>&nbsp;&nbsp;<b>A</b>, <b>B</b>, <b>C</b> werden beim Code 1 durch 2 Bit dargestellt, <b>E</b>, <b>F</b> durch 3 Bit. Damit erhält man für
 
  
:* die mittlere Codewortlänge
+
 
 +
'''(3)'''&nbsp; $\rm A$,&nbsp; $\rm B$&nbsp; and&nbsp; $\rm C$&nbsp; are represented by two bits in&nbsp; $\text{Code 1}$&nbsp;,&nbsp; $\rm E$&nbsp; and&nbsp; $\rm F$&nbsp; by three bits.&nbsp; Thus one obtains for
 +
 
 +
* the average code word length
 
:$$L_{\rm M} =  p_{\rm A}\cdot 2 + p_{\rm B}\cdot 2 + p_{\rm C}\cdot 2 + p_{\rm D}\cdot 3 + p_{\rm E}\cdot 3
 
:$$L_{\rm M} =  p_{\rm A}\cdot 2 + p_{\rm B}\cdot 2 + p_{\rm C}\cdot 2 + p_{\rm D}\cdot 3 + p_{\rm E}\cdot 3
 
\hspace{0.05cm},$$
 
\hspace{0.05cm},$$
:* für die Quellenentropie:
+
* for the source entropy:
:$$H =  p_{\rm A}\cdot {\rm ld}\hspace{0.15cm}\frac{1}{p_{\rm A}} + p_{\rm B}\cdot {\rm ld}\hspace{0.15cm}\frac{1}{p_{\rm B}} + p_{\rm C}\cdot  
+
:$$H =  p_{\rm A}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm A}} + p_{\rm B}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm B}} + p_{\rm C}\cdot  
{\rm ld}\hspace{0.15cm}\frac{1}{p_{\rm C}} + p_{\rm D}\cdot {\rm ld}\hspace{0.15cm}\frac{1}{p_{\rm D}} + p_{\rm E}\cdot {\rm ld}\hspace{0.15cm}\frac{1}{p_{\rm E}}
+
{\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm C}} + p_{\rm D}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm D}} + p_{\rm E}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm E}}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
Durch Vergleich aller Terme kommt man zum Ergebnis:
+
By comparing all the terms, we arrive at the result:
:$$p_{\rm A}= p_{\rm B}=  p_{\rm C}\hspace{0.15cm}\underline{= 0.25} \hspace{0.05cm}, \hspace{0.2cm}p_{\rm D}= p_{\rm E}\hspace{0.15cm}\underline{= 0.125}$$
+
:$$p_{\rm A}= p_{\rm B}=  p_{\rm C}\hspace{0.15cm}\underline{= 0.25} \hspace{0.05cm}, \hspace{0.2cm}p_{\rm D}= p_{\rm E}\hspace{0.15cm}\underline{= 0.125}\hspace{0.3cm}
:$$\Rightarrow\hspace{0.3cm} L_{\rm M} = H = 2.25\,{\rm bit/Quellensymbol} \hspace{0.05cm}.$$
+
\Rightarrow\hspace{0.3cm} L_{\rm M} = H = 2.25\,{\rm bit/source \:symbol} \hspace{0.05cm}.$$
Man erkennt: Mit diesen &bdquo;günstigeren&rdquo; Wahrscheinlichkeiten ergibt sich sogar eine größere mittlere Codewortlänge. Die Gleichheit (<i>L</i><sub>M</sub> = <i>H</i>) ist allein auf die nun größere Quellenentropie zurückzuführen.
+
It can be seen:
 +
*With these&nbsp; "more favourable"&nbsp; probabilities, there is even a larger average code word length than with the&nbsp; "less favourable"&nbsp; ones.
 +
*The equality&nbsp; $(L_{\rm M} = H)$&nbsp; is therefore solely due to the now larger source entropy.
 +
 
 +
 
 +
 
 +
'''(4)'''&nbsp; For example, one (of many) simulations with the probabilities according to subtask&nbsp; '''(3)'''&nbsp; yields the sequence with&nbsp; $N = 40$&nbsp; character:
 +
:$$\rm EBDCCBDABEBABCCCCCBCAABECAACCBAABBBCDCAB.$$
 +
 
 +
*This results in&nbsp; $L_{\rm M}\hspace{0.01cm}' = ( 34 \cdot 2 + 6 \cdot 3)/50  = 2.15$&nbsp; bit/source symbol,  i.e. a smaller value than for the unlimited sequence&nbsp; $(L_{\rm M} = 2.25$ bit/source symbol$)$.
 +
*However, with a different starting value of the random generator,&nbsp; $(L_{\rm M}\hspace{0.03cm}' \ge L_{\rm M})$&nbsp; is also possible.
 +
*This means: &nbsp; <u>All &nbsp;statements</u> are correct.
  
<b>4.</b>&nbsp;&nbsp;Beispielsweise liefert eine (von vielen) Simulationen mit den Wahrscheinlichkeiten gemäß der Teilaufgabe (c) die Folge <b>EBDCCBDABEBABCCCCCBCAABECAACCBAABBBCDCAB</b> (mit <nobr><i>N</i> = 40 Zeichen).</nobr> Damit ergibt sich:
 
:$$L_{\rm M}' = ( 34 \cdot 2 + 6 \cdot 3)/50  = 2.15\,{\rm bit/Quellensymbol} \hspace{0.05cm},$$
 
also ein kleinerer Wert als für die unendlich lange Folge (<i>L</i><sub>M</sub> = 2.25 bit/Quellensymbol). Bei anderem Startwert des Zufallsgenerators ist aber auch <i>L</i>&prime;<sub>M</sub> &#8805; <i>L</i><sub>M</sub> möglich. <u>Alle Aussagen</u> sind zutreffend.
 
  
<b>5.</b>&nbsp;&nbsp;Richtig ist nur der <u>Lösungsvorschlag 1</u>.
 
  
:* Code 1 ist ein Huffman&ndash;Code, wie schon in den vorherigen Teilaufgaben gezeigt wurde. Dies gilt zwar nicht für alle Symbolwahrscheinlichkeiten, aber zumindest für die Parametersätze gemäß den Teilaufgaben (a) und (c).
+
'''(5)'''&nbsp; Only <u>solution suggestion 1</u>:
 +
*&nbsp; $\text{Code 1}$&nbsp; is a Huffman code, as has already been shown in the previous subtasks. <br>This is not true for all symbol probabilities, but at least for the parameter sets according to subtasks&nbsp; '''(1)'''&nbsp; and&nbsp; '''(3)'''.
  
:* Code 2 ist kein Huffman&ndash;Code, da ein solcher stets präfixfrei sein müsste. Die Präfixfreiheit ist hier aber nicht gegeben, da <b>0</b> der Beginn des Codewortes <b>01</b> ist.
+
*&nbsp; $\text{Code 2}$&nbsp; is not a Huffman code, since such a code would always have to be prefix-free. <br>However, the prefix freedom is not given here, since&nbsp; <b>1</b>&nbsp; is the beginning of the code word&nbsp; <b>10</b>&nbsp;.
  
:* Code 3 ist ebenfalls kein Huffman&ndash;Code, da er eine um <i>p</i><sub>C</sub> (Wahrscheinlichkeit von <b>C</b>) größere mittlere Codewortlänge aufweist als erforderlich (Code 1). Er ist somit nicht optimal: Es gibt keine Symbolwahrscheinlichkeiten <i>p</i><sub>A</sub>, ... , <i>p</i><sub>E</sub>, die es rechtfertigen würden, das Symbol <b>C</b> mit <b>010</b> anstelle von <b>01</b> zu codieren.
+
*&nbsp; $\text{Code 3}$&nbsp; is also not a Huffman code, since it has an average code word length that is&nbsp; $p_{\rm C}$&nbsp; longer than required&nbsp; $($see $\text{Code 1})$. It is not optimal&nbsp; <br>There are no symbol probabilities&nbsp; $p_{\rm A}$, ... ,&nbsp; $p_{\rm E}$, that would justify coding the symbol&nbsp; $\rm C$&nbsp; with&nbsp; <b>010</b>&nbsp; instead of&nbsp; <b>01</b>&nbsp;.
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  
  
  
[[Category:Aufgaben zu Informationstheorie und Quellencodierung|^2.3 Entropiecodierung nach Huffman^]]
+
[[Category:Information Theory: Exercises|^2.3 Entropy Coding according to Huffman^]]

Latest revision as of 15:55, 1 November 2022

Three codes to choose from

  David Albert Huffman's  algorithm implements entropy coding with the following properties:

  • The resulting binary code is prefix-free and thus easily (and immediately) decodable.
  • With a memoryless source, the code leads to the smallest possible average code word length  $L_{\rm M}$.
  • However, $L_{\rm M}$  is never smaller than the source entropy  $H$.
  • These two quantities can be calculated from the  $M$  symbol probabilities alone.


For this exercise, we assume a memoryless source with the symbol set size  $M = 5$  and the alphabet

$$\{ {\rm A},\ {\rm B},\ {\rm C},\ {\rm D},\ {\rm E} \}.$$

In the above diagram, three binary codes are given.  You are to decide which of these codes were (or could be) created by applying the Huffman algorithm.




Hints:


Questions

1

Which codes could have arisen according to Huffman for  $p_{\rm A} = p_{\rm B} = p_{\rm C} = 0.3$  and  $p_{\rm D} = p_{\rm E} = 0.05$?

$\text{Code 1}$,
$\text{Code 2}$,
$\text{Code 3}$.

2

How are the average code word length  $L_{\rm M}$  and the entropy  $H$  related for the given probabilities?

$L_{\rm M} < H$,
$L_{\rm M} \ge H$,
$L_{\rm M} > H$.

3

Consider  $\text{Code 1}$.  With what symbol probabilities would  $L_{\rm M} = H$  hold?

$\ p_{\rm A} \ = \ $

$\ p_{\rm B} \ = \ $

$\ p_{\rm C} \ = \ $

$\ p_{\rm D} \ = \ $

$\ p_{\rm E} \ = \ $

4

The probabilities calculated in subtask  (3)  still apply.
However, the average code word length is now determined for a sequence of length  $N = 40$   ⇒  $L_{\rm M}\hspace{0.03cm}'$. What is possible?

$L_{\rm M}\hspace{0.01cm}' < L_{\rm M}$,
$L_{\rm M}\hspace{0.01cm}' = L_{\rm M}$,
$L_{\rm M}\hspace{0.01cm}' > L_{\rm M}$.

5

Which code could possibly be a Huffman code?

$\text{Code 1}$,
$\text{Code 2}$,
$\text{Code 3}$.


Solution

Huffman tree diagrams for subtasks  (1)  and  (3)

(1) Solution suggestion 1 is correct.

  • The diagram shows the construction of the Huffman code by means of a tree diagram.
  • With the assignment red   →   1 and blue   →   0 one obtains:  
    $\rm A$   →   11, $\rm B$   →   10, $\rm C$   →   01, $\rm D$   →   001, $\rm E$   →   000.
  • The left diagram applies to the probabilities according to subtask  (1)
  • The diagram on the right belongs to subtask  (3)  with slightly different probabilities.  
  • However, it provides exactly the same code.


(2) Proposed solution 3 is correct, as the following calculation also shows:

$$L_{\rm M} \hspace{0.2cm} = \hspace{0.2cm} (0.3 + 0.3 + 0.3) \cdot 2 + (0.05 + 0.05) \cdot 3 = 2.1\,{\rm bit/source \:symbol}\hspace{0.05cm},$$
$$H \hspace{0.2cm} = \hspace{0.2cm} 3 \cdot 0.3 \cdot {\rm log_2}\hspace{0.15cm}(1/0.3) + 2 \cdot 0.05 \cdot {\rm log_2}\hspace{0.15cm}(1/0.05) \approx 2.0\,{\rm bit/source \:symbol}\hspace{0.05cm}.$$
  • According to the source coding theorem,   $L_{\rm M} \ge H$ always holds.
  • However, a prerequisite for  $L_{\rm M} = H$  is that all symbol probabilities can be represented in the form  $2^{-k} \ (k = 1, \ 2, \ 3,\ \text{ ...})$ .
  • This does not apply here.


(3)  $\rm A$,  $\rm B$  and  $\rm C$  are represented by two bits in  $\text{Code 1}$ ,  $\rm E$  and  $\rm F$  by three bits.  Thus one obtains for

  • the average code word length
$$L_{\rm M} = p_{\rm A}\cdot 2 + p_{\rm B}\cdot 2 + p_{\rm C}\cdot 2 + p_{\rm D}\cdot 3 + p_{\rm E}\cdot 3 \hspace{0.05cm},$$
  • for the source entropy:
$$H = p_{\rm A}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm A}} + p_{\rm B}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm B}} + p_{\rm C}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm C}} + p_{\rm D}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm D}} + p_{\rm E}\cdot {\rm log_2}\hspace{0.15cm}\frac{1}{p_{\rm E}} \hspace{0.05cm}.$$

By comparing all the terms, we arrive at the result:

$$p_{\rm A}= p_{\rm B}= p_{\rm C}\hspace{0.15cm}\underline{= 0.25} \hspace{0.05cm}, \hspace{0.2cm}p_{\rm D}= p_{\rm E}\hspace{0.15cm}\underline{= 0.125}\hspace{0.3cm} \Rightarrow\hspace{0.3cm} L_{\rm M} = H = 2.25\,{\rm bit/source \:symbol} \hspace{0.05cm}.$$

It can be seen:

  • With these  "more favourable"  probabilities, there is even a larger average code word length than with the  "less favourable"  ones.
  • The equality  $(L_{\rm M} = H)$  is therefore solely due to the now larger source entropy.


(4)  For example, one (of many) simulations with the probabilities according to subtask  (3)  yields the sequence with  $N = 40$  character:

$$\rm EBDCCBDABEBABCCCCCBCAABECAACCBAABBBCDCAB.$$
  • This results in  $L_{\rm M}\hspace{0.01cm}' = ( 34 \cdot 2 + 6 \cdot 3)/50 = 2.15$  bit/source symbol, i.e. a smaller value than for the unlimited sequence  $(L_{\rm M} = 2.25$ bit/source symbol$)$.
  • However, with a different starting value of the random generator,  $(L_{\rm M}\hspace{0.03cm}' \ge L_{\rm M})$  is also possible.
  • This means:   All  statements are correct.


(5)  Only solution suggestion 1:

  •   $\text{Code 1}$  is a Huffman code, as has already been shown in the previous subtasks.
    This is not true for all symbol probabilities, but at least for the parameter sets according to subtasks  (1)  and  (3).
  •   $\text{Code 2}$  is not a Huffman code, since such a code would always have to be prefix-free.
    However, the prefix freedom is not given here, since  1  is the beginning of the code word  10 .
  •   $\text{Code 3}$  is also not a Huffman code, since it has an average code word length that is  $p_{\rm C}$  longer than required  $($see $\text{Code 1})$. It is not optimal 
    There are no symbol probabilities  $p_{\rm A}$, ... ,  $p_{\rm E}$, that would justify coding the symbol  $\rm C$  with  010  instead of  01 .