Aufgaben:Exercise 2.2Z: Average Code Word Length: Difference between revisions

Latest revision as of 17:54, 16 March 2026

The aim of data compression is to represent the message of a source with as few binary characters as possible.

We consider here a discrete message source with the symbol set $\rm \{ A, \ B, \ C, \ D\}$ ⇒ symbol set size $M = 4$ and the symbol probabilities

$p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} = 1/4$ $($subtask $1)$,
$p_{\rm A} = 1/2, \, p_{\rm B} = 1/4, \, p_{\rm C} = p_{\rm D} = 1/8$ $($subtask $2$ to $5)$.

It is assumed that there are no statistical Dependencies between the individual source symbols.

Three assignments are given. To be noted:

Each of these binary codes $\rm C1$, $\rm C2$ and $\rm C3$ is designed for a specific source statistic.
All codes are prefix-free and thus immediately decodable without further specification.

A measure for the quality of a compression method is the average code word length $L_{\rm M}$ with the additional unit "bit/source symbol".

Hint:

The exercise belongs to the chapter General Description of Source Coding.

Questions

$\text{C1:}\ \ L_{\rm M} \ = \ $ $\ \rm bit/source\hspace{0.15cm} symbol$

$\text{C2:}\ \ L_{\rm M} \ = \ $ $\ \rm bit/source\hspace{0.15cm} symbol$

$\text{C3:}\ \ L_{\rm M} \ = \ $ $\ \rm bit/source\hspace{0.15cm} symbol$

$\text{C1:}\ \ L_{\rm M} \ = \ $ $\ \rm bit/source\hspace{0.15cm} symbol$

$\text{C2:}\ \ L_{\rm M} \ = \ $ $\ \rm bit/source\hspace{0.15cm} symbol$

$\text{C3:}\ \ L_{\rm M} \ = \ $ $\ \rm bit/source\hspace{0.15cm} symbol$

	No code word is the beginning of another code word.
	All code words have the same length.

	The code $\rm C1$,
	the code $\rm C2$.

	$\rm AACDBACABADAAA$ ...
	$\rm ACBCCCACAACCD$ ...

Solution

(1) The average code word length is generally given by

$$L_{\rm M} = p_{\rm A} \cdot L_{\rm A} + p_{\rm B} \cdot L_{\rm B}+ p_{\rm C} \cdot L_{\rm C} + p_{\rm D} \cdot L_{\rm D}\hspace{0.05cm}.$$

If the four source symbols are equally probable $($all probabilities exactly $1/4)$, then for this we can also write:

$$L_{\rm M} = 1/4 \cdot ( L_{\rm A} + L_{\rm B}+ L_{\rm C} + L_{\rm D})\hspace{0.05cm}.$$

$\text{Code C1:}$ $L_{\rm M} \hspace{0.15cm}\underline{= 2.00}\ \rm bit/source\hspace{0.15cm} symbol$,
$\text{Code C2:}$ $L_{\rm M} \hspace{0.15cm}\underline{= 2.25}\ \rm bit/source\hspace{0.15cm} symbol$
$\text{Code C3:}$ $L_{\rm M} \hspace{0.15cm}\underline{= 2.25}\ \rm bit/source\hspace{0.15cm} symbol$.

(2) With the code table $\text{C1}$ , the average code word length $L_{\rm M} \hspace{0.15cm}\underline{= 2}\ \rm bit/source\hspace{0.15cm} symbol$ is always obtained, independent of the symbol probabilities.

For the other two codes one obtains:

$\text{Code C2:}$ $L_{\rm M} = 1/2 \cdot 1 + 1/4 \cdot 2 + 1/8 \cdot 3 + 1/8 \cdot 3 \hspace{0.15cm}\underline{= 1.75}\ \rm bit/source\hspace{0.15cm} symbol$,
$\text{Code C3:}$ $L_{\rm M} = 1/2 \cdot 3 + 1/4 \cdot 2 + 1/8 \cdot 1 + 1/8 \cdot 3 \hspace{0.15cm}\underline{= 2.50}\ \rm bit/source\hspace{0.15cm} symbol$.

From the example you can see the principle:

Probable symbols are represented by a few binary symbols, improbable ones by more.
In the case of equally probable symbols, it is best to choose the same code word lengths.

(3) Solution suggestion 1 is correct:

The code $\text{C1}$ with uniform length of all code words is prefix-free,
but other codes can also be prefix-free, for example the codes $\text{C2}$ and $\text{C3}$.

(4) Solution suggestion 1 is correct:

Already from "00" at the beginning one can see that the code $\text{C2}$ is out of the question here,
because otherwise the source symbol sequence would have to begin with "AA".
In fact, the code $\text{C1}$ was used.

(5) Solution suggestion 2 is correct.

The first suggested solution gives the source symbol sequence for code $\text{C2}$ if the encoded sequence would be "$\rm 001101111001100100111000$".

@@ Line 1: / Line 1: @@
-{{quiz-Header|Buchseite=Informationstheorie/Allgemeine Beschreibung
+{{quiz-Header|Buchseite=Information_Theory/General_Description
 }}
-[[File:P_ID2417__Inf_Z_2_2.png|right|Tabellen zur Quellencodierung]]
+[[File:P_ID2417__Inf_Z_2_2.png|right|frame|Three source coding tables]]
-Ziel von Datenkomprimierung ist es, die Nachricht einer Quelle mit möglichst wenigen Binärzeichen darzustellen.
+The aim of data compression is to represent the message of a source with as few binary characters as possible.
-Wir betrachten hier eine wertdiskrete Nachrichtenquelle mit dem Symbolvorrat $\{ A, B, C, D\}$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; Symbolumfang $M = 4$ und den Auftrittswahrscheinlichkeiten
+We consider here a discrete message source with the symbol set&nbsp; $\rm \{ A, \ B, \ C, \ D\}$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; symbol set size&nbsp; $M = 4$&nbsp; and the symbol probabilities
-*$p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} =  1/4$ (Teilaufgabe 1),
+:*$p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} =  1/4$&nbsp; $($subtask&nbsp; $1)$,
-* $p_{\rm A} = 1/2, \,  p_{\rm B} = 1/4, \, p_{\rm C} = p_{\rm D} =  1/8$  (ab Teilaufgabe 2).
+:* $p_{\rm A} = 1/2, \,  p_{\rm B} = 1/4, \, p_{\rm C} = p_{\rm D} =  1/8$&nbsp;  $($subtask&nbsp; $2$&nbsp; to&nbsp; $5)$.
-Vorausgesetzt wird, dass es zwischen den einzelnen Quellensymbolen keine statistischen Bindungen gibt.
+It is assumed that there are no statistical Dependencies between the individual source symbols.
-Ein Maß für die Güte eines Komprimierungsverfahrens ist die mittlere Codewortlänge $L_{\rm M}$  mit der Zusatzeinheit &bdquo;bit/Quellensymbol&rdquo;. Vorgegeben sind drei Zuordnungen. Anzumerken ist:
+Three assignments are given. To be noted:
-* Jeder dieser Binärcodes C1, C2 und C3 ist für eine spezielle Quellenstatistik ausgelegt.
+* Each of these binary codes&nbsp; $\rm C1$,&nbsp; $\rm C2$&nbsp; and&nbsp; $\rm C3$&nbsp; is designed for a specific source statistic.
-* Alle Codes sind präfixfrei und somit ohne weitere Angabe sofort decodierbar.
+* All codes are prefix-free and thus immediately decodable without further specification.
-''Hinweise:''
+A measure for the quality of a compression method is the average code word length&nbsp; $L_{\rm M}$&nbsp;  with the additional unit&nbsp; "bit/source symbol".
-*Die Aufgabe gehört zum  Kapitel [[Informationstheorie/Allgemeine_Beschreibung|Allgemeine Beschreibung der Quellencodierung]].
-*Sollte die Eingabe des Zahlenwertes &bdquo;0&rdquo; erforderlich sein, so geben Sie bitte &bdquo;0.&rdquo; ein.
-===Fragebogen===
+Hint:
+*The exercise belongs to the chapter&nbsp; [[Information_Theory/Allgemeine_Beschreibung|General Description of Source Coding]].
+===Questions===
 <quiz display=simple>
-{Bestimmen Sie die mittlere Codewortlänge $L_{\rm M}$ für $p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} =  1/4$.
+{Determine the average code word length&nbsp; $L_{\rm M}$&nbsp; for&nbsp; $p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} =  1/4$.
 |type="{}"}
-$\text{C1:}\ \ L_{\rm M} \ = $  { 2 1% } $\ \rm bit/Quellensymbol$
+$\text{C1:}\ \ L_{\rm M} \ = \ $  { 2 1% } $\ \rm bit/source\hspace{0.15cm} symbol$
-$\text{C2:}\ \ L_{\rm M} \ = $  { 2.25 1% } $\ \rm bit/Quellensymbol$
+$\text{C2:}\ \ L_{\rm M} \ =  \ $  { 2.25 1% } $\ \rm bit/source\hspace{0.15cm} symbol$
-$\text{C3:}\ \ L_{\rm M} \ = $  { 2.25 1% } $\ \rm bit/Quellensymbol$
+$\text{C3:}\ \ L_{\rm M} \ =  \ $  { 2.25 1% } $\ \rm bit/source\hspace{0.15cm} symbol$
-{Welche Werte ergeben sich für $p_{\rm A} = 1/2, \,  p_{\rm B} = 1/4, \, p_{\rm C} = p_{\rm D} =  1/8$?
+{Which values result for&nbsp; $p_{\rm A} = 1/2, \,  p_{\rm B} = 1/4, \, p_{\rm C} = p_{\rm D} =  1/8$?
 |type="{}"}
-$\text{C1:}\ \ L_{\rm M} \ = $  { 2 1% } $\ \rm bit/Quellensymbol$
+$\text{C1:}\ \ L_{\rm M} \ =  \ $  { 2 1% } $\ \rm bit/source\hspace{0.15cm} symbol$
-$\text{C2:}\ \ L_{\rm M} \ = $  { 1.75 1% } $\ \rm bit/Quellensymbol$
+$\text{C2:}\ \ L_{\rm M} \ =  \ $  { 1.75 1% } $\ \rm bit/source\hspace{0.15cm} symbol$
-$\text{C3:}\ \ L_{\rm M} \ = $  { 2.5 1% } $\ \rm bit/Quellensymbol$
+$\text{C3:}\ \ L_{\rm M} \ =  \ $  { 2.5 1% } $\ \rm bit/source\hspace{0.15cm} symbol$
-{Woran erkennt man präfixfreie Codes?
+{How can you recognise prefix-free codes?
 |type="[]"}
-+ Kein Codewort ist der Beginn eines anderen Codewortes.
++ No code word is the beginning of another code word.
-- Alle Codeworte haben gleiche Länge.
+- All code words have the same length.
-{Für die spezielle Quellensymbolfolge $\rm ADBDCBCBADCA$ ergibt sich die Codesymbolfolge $\rm 001101111001100100111000$.
+{For the special source symbol sequence&nbsp; $\rm ADBDCBCBADCA$&nbsp;, the encoded sequence&nbsp; $\rm 001101111001100100111000$&nbsp; results.
-<br>Welcher Code wurde verwendet?
+<br>Which code was used?
-|type="[]"}
+|type="()"}
-+ der Code '''C1''',
++ The code&nbsp;  $\rm C1$,
-- der Code '''C2'''.
+- the code&nbsp; $\rm C2$.
-{Nach Codierung mit '''C3''' erhält man $\rm 001101111001100100111000$. Wie lautet die zugehörige Quellensymbolfolge?
+{After coding with&nbsp; $\rm C3$,&nbsp; you get&nbsp; $\rm 001101111001100100111000$.&nbsp; What is the corresponding source symbol sequence?
-|type="[]"}
+|type="()"}
 - $\rm AACDBACABADAAA$ ...
 + $\rm ACBCCCACAACCD$ ...
@@ Line 62: / Line 73: @@
 </quiz>
-===Musterlösung===
+===Solution===
 {{ML-Kopf}}
-:<b>1.</b>&nbsp;&nbsp;Die mittlere Codewortlänge ergibt sich allgemein zu
+'''(1)'''&nbsp; The average code word length is generally given by
-:$$L_{\rm M} = p_{\rm A} \cdot L_{\rm A} + p_{\rm B} \cdot L_{\rm B}+ p_{\rm C} \cdot L_{\rm C} + p_{\rm D} \cdot L_{\rm D}
+:$$L_{\rm M} = p_{\rm A} \cdot L_{\rm A} + p_{\rm B} \cdot L_{\rm B}+ p_{\rm C} \cdot L_{\rm C} + p_{\rm D} \cdot L_{\rm D}\hspace{0.05cm}.$$
-\hspace{0.05cm}.$$
+If the four source symbols are equally probable&nbsp; $($all probabilities exactly&nbsp; $1/4)$, then for this we can also write:
-:Sind die vier Quellensymbole gleichwahrscheinlich (alle Wahrscheinlichkeiten genau 1/4), so kann dafür auch geschrieben werden:
+:$$L_{\rm M} = 1/4 \cdot ( L_{\rm A} + L_{\rm B}+ L_{\rm C} + L_{\rm D})\hspace{0.05cm}.$$
-:$$L_{\rm M} = 1/4 \cdot [ L_{\rm A} + L_{\rm B}+ L_{\rm C} + L_{\rm D}]
+* $\text{Code C1:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2.00}\ \rm bit/source\hspace{0.15cm} symbol$,
-\hspace{0.05cm}.$$
+* $\text{Code C2:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2.25}\ \rm bit/source\hspace{0.15cm} symbol$
-:* Code C1:&nbsp;&nbsp;&nbsp; <i>L</i><sub>M</sub> <u>= 2.00 bit/Quellensymbol</u>,
+* $\text{Code C3:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2.25}\ \rm bit/source\hspace{0.15cm} symbol$.
+'''(2)'''&nbsp; With the code table&nbsp; $\text{C1}$&nbsp;, the average code word length&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2}\ \rm bit/source\hspace{0.15cm} symbol$&nbsp; is always obtained, independent of the symbol probabilities.
+For the other two codes one obtains:
+* $\text{Code C2:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} = 1/2 \cdot 1 + 1/4 \cdot 2 + 1/8 \cdot 3 + 1/8 \cdot 3 \hspace{0.15cm}\underline{= 1.75}\ \rm bit/source\hspace{0.15cm} symbol$,
+* $\text{Code C3:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} = 1/2 \cdot 3 + 1/4 \cdot 2 + 1/8 \cdot 1 + 1/8 \cdot 3 \hspace{0.15cm}\underline{= 2.50}\ \rm bit/source\hspace{0.15cm} symbol$.
+From the example you can see the principle:
+*Probable symbols are represented by a few binary symbols, improbable ones by more.
+*In the case of equally probable symbols, it is best to choose the same code word lengths.
-:* Code C2:&nbsp;&nbsp;&nbsp; <i>L</i><sub>M</sub> <u>= 2.25 bit/Quellensymbol</u>,
-:* Code C3:&nbsp;&nbsp;&nbsp; <i>L</i><sub>M</sub> <u>= 2.25 bit/Quellensymbol</u>.
-:<b>2.</b>&nbsp;&nbsp;Mit der Codetabelle <i>C</i>1 ergibt sich unabhängig von den Symbolwahrscheinlichkeiten stets die mittlere Codewortlänge <i>L</i><sub>M</Sub> <u>= 2 bit/Quellensymbol</u>. Für die beiden anderen Codes erhält man
+'''(3)'''&nbsp; <u>Solution suggestion 1</u> is correct:
+*The code&nbsp; $\text{C1}$&nbsp; with uniform length of all code words is prefix-free,
+*but other codes can also be prefix-free, for example the codes&nbsp;  $\text{C2}$&nbsp; and&nbsp;  $\text{C3}$.
-:* Code C2:&nbsp;&nbsp;&nbsp; <i>L</i><sub>M</sub> = 1/2 &middot; 1 + 1/4 &middot; 2 + 1/8 &middot; 3 + 1/8 &middot; 3 = <u>1.75 bit/Quellensymbol</u>,
-:* Code C3:&nbsp;&nbsp;&nbsp; <i>L</i><sub>M</sub> = 1/2 &middot; 3 + 1/4 &middot; 2 + 1/8 &middot; 1 + 1/8 &middot; 3 = <u>2.50 bit/Quellensymbol</u>.
-:Man erkennt aus dem Beispiel das Prinzip: Wahrscheinliche Symbole werden durch wenige Binärsymbole dargestellt und unwahrscheinliche durch mehr. Bei gleichwahrscheinlichen Symbolen wählt man am besten auch die Codewortlängen gleich.
+'''(4)'''&nbsp; <u>Solution suggestion 1</u> is correct:
+*Already from&nbsp; "00"&nbsp; at the beginning one can see that the code&nbsp; $\text{C2}$&nbsp; is out of the question here, <br>because otherwise the source symbol sequence would have to begin with&nbsp; "AA".
+*In fact, the code&nbsp; $\text{C1}$&nbsp; was used.
-:<b>3.</b>&nbsp;&nbsp;Richtig ist <u>Lösungsvorschlag 1</u>. Ein Code mit einheitlicher Länge aller Codeworte ist zwar präfixfrei, aber auch andere Codes können präfixfrei sein, zum Beispiel die Codes C2 und C3.
-:<b>4.</b>&nbsp;&nbsp;Bereits aus &bdquo;00&rdquo; am Anfang erkennt man, dass der Code C2 hier nicht in Frage kommt, da sonst die Quellensymbolfolge mit &bdquo;AA&rdquo; beginnen müsste. Tatsächlich wurde der <u>Code C1</u> verwendet.
-:<b>5.</b>&nbsp;&nbsp;Richtig ist der <u>Lösungsvorschlag 2</u>. Der erste Lösungsvorschlag gibt die Quellensymbolfolge für den Code C2 an, wenn die Codesymbolfolge <b>001101111001100100111000</b> lauten würde.
+'''(5)'''&nbsp; <u>Solution suggestion 2</u> is correct.
+The first suggested solution gives the source symbol sequence for code&nbsp; $\text{C2}$&nbsp; if the encoded sequence would be &nbsp; "$\rm 001101111001100100111000$".
 {{ML-Fuß}}
-[[Category:Aufgaben zu Informationstheorie|^2.1 Allgemeine Beschreibung^]]
+[[Category:Information Theory: Exercises|^2.1 General Description^]]
+[[de:Aufgaben:Aufgabe 2.2Z: Mittlere Codewortlänge]]