Difference between revisions of "Aufgaben:Exercise 2.2Z: Average Code Word Length"

Revision as of 21:43, 2 July 2021

Source coding table

The aim of data compression is to represent the message of a source with as few binary characters as possible.

We consider here a discrete-value message source with the symbol set $\rm \{ A, \ B, \ C, \ D\}$ ⇒ symbol range $M = 4$ and the probabilities of occurrence

$p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} = 1/4$ (subtask 1),
$p_{\rm A} = 1/2, \, p_{\rm B} = 1/4, \, p_{\rm C} = p_{\rm D} = 1/8$ (from subtask 2).

It is assumed that there are no statistical ties between the individual source symbols.

A measure for the quality of a compression method is the average codeword length $L_{\rm M}$ with the additional unit "bit/source symbol".

Three assignments are given. To be noted:

Each of these binary codes $\rm C1$, $\rm C2$ and $\rm C3$ is designed for a specific source statistic.
All codes are prefix-free and thus immediately decodable without further specification.

Hint:

The task belongs to the chapter General description of source coding.

Questions

$\text{C1:}\ \ L_{\rm M} \ = \ $

$\ \rm bit/source symbol$

$\text{C2:}\ \ L_{\rm M} \ = \ $

$\ \rm bit/source symbol$

$\text{C3:}\ \ L_{\rm M} \ = \ $

$\ \rm bit/source symbol$

$\text{C1:}\ \ L_{\rm M} \ = \ $

$\ \rm bit/source symbol$

$\text{C2:}\ \ L_{\rm M} \ = \ $

$\ \rm bit/source symbol$

$\text{C3:}\ \ L_{\rm M} \ = \ $

$\ \rm bit/source symbol$

	No code word is the beginning of another code word.
	All codewords have the same length.

	the code $\rm C1$,
	the code $\rm C2$.

	$\rm AACDBACABADAAA$ ...
	$\rm ACBCCCACAACCD$ ...

Solution

(1) The average codeword length is generally given by

$$L_{\rm M} = p_{\rm A} \cdot L_{\rm A} + p_{\rm B} \cdot L_{\rm B}+ p_{\rm C} \cdot L_{\rm C} + p_{\rm D} \cdot L_{\rm D} \hspace{0.05cm}.$$

If the four source symbols are equally probable $($all probabilities exactly $1/4)$, then for this we can also write:

$$L_{\rm M} = 1/4 \cdot ( L_{\rm A} + L_{\rm B}+ L_{\rm C} + L_{\rm D}) \hspace{0.05cm}.$$

$\text{Code C1:}$ $L_{\rm M} \hspace{0.15cm}\underline{= 2.00}\ \rm bit/source symbol$,
$\text{Code C2:}$ $L_{\rm M} \hspace{0.15cm}\underline{= 2.25}\ \rm bit/source symbol$
$\text{Code C3:}$ $L_{\rm M} \hspace{0.15cm}\underline{= 2.25}\ \rm bit/source symbol$.

(2) With the code table $\text{C1}$ , the average code word length $L_{\rm M} \hspace{0.15cm}\underline{= 2}\ \rm bit/source symbol$ is always obtained, independent of the symbol probabilities.

For the other two codes one obtains:

$\text{Code C2:}$ $L_{\rm M} = 1/2 \cdot 1 + 1/4 \cdot 2 + 1/8 \cdot 3 + 1/8 \cdot 3 \hspace{0.15cm}\underline{= 1.75}\ \rm bit/source symbol$,
$\text{Code C3:}$ $L_{\rm M} = 1/2 \cdot 3 + 1/4 \cdot 2 + 1/8 \cdot 1 + 1/8 \cdot 3 \hspace{0.15cm}\underline{= 2.50}\ \rm bit/source symbol$.

You can see the principle from the example:

Probable symbols are represented by a few binary symbols, improbable ones by more.
In the case of equally probable symbols, it is best to choose the same codeword lengths.

(3) Solution suggestion 1 is correct:

The code $\text{C1}$ with uniform length of all codewords is prefix-free,
but other codes can also be prefix-free, for example the codes $\text{C2}$ and $\text{C3}$.

(4) Solution suggestion 1 is correct:

Already from "00" at the beginning one can see that the code $\text{C2}$ is out of the question here, because otherwise the source symbol sequence would have to begin with "AA".
In fact, the code $\text{C1}$ was used.

(5) Solution suggestion 2 is correct:

The first suggested solution, on the other hand, gives the source symbol sequence for code $\text{C2}$ if the code symbol sequence would be $\rm 001101111001100100111000$ .

@@ Line 3: / Line 3: @@
 }}
-[[File:P_ID2417__Inf_Z_2_2.png|right|frame|Tabelle zur Quellencodierung]]
+[[File:P_ID2417__Inf_Z_2_2.png|right|frame|Source coding table]]
-Ziel von Datenkomprimierung ist es, die Nachricht einer Quelle mit möglichst wenigen Binärzeichen darzustellen.
+The aim of data compression is to represent the message of a source with as few binary characters as possible.
-Wir betrachten hier eine wertdiskrete Nachrichtenquelle mit dem Symbolvorrat&nbsp; $\rm \{ A, \ B, \ C, \ D\}$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; Symbolumfang&nbsp; $M = 4$&nbsp; und den Auftrittswahrscheinlichkeiten
+We consider here a discrete-value message source with the symbol set&nbsp; $\rm \{ A, \ B, \ C, \ D\}$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; symbol range&nbsp; $M = 4$&nbsp; and the probabilities of occurrence
-:*$p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} =  1/4$&nbsp; (Teilaufgabe 1),
+:*$p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} =  1/4$&nbsp; (subtask 1),
-:* $p_{\rm A} = 1/2, \,  p_{\rm B} = 1/4, \, p_{\rm C} = p_{\rm D} =  1/8$&nbsp;  (ab Teilaufgabe 2).
+:* $p_{\rm A} = 1/2, \,  p_{\rm B} = 1/4, \, p_{\rm C} = p_{\rm D} =  1/8$&nbsp;  (from subtask 2).
-Vorausgesetzt wird, dass es zwischen den einzelnen Quellensymbolen keine statistischen Bindungen gibt.
+It is assumed that there are no statistical ties between the individual source symbols.
-Ein Maß für die Güte eines Komprimierungsverfahrens ist die mittlere Codewortlänge&nbsp; $L_{\rm M}$&nbsp;  mit der Zusatzeinheit "bit/Quellensymbol".
+A measure for the quality of a compression method is the average codeword length&nbsp; $L_{\rm M}$&nbsp;  with the additional unit "bit/source symbol".
-Vorgegeben sind drei Zuordnungen. Anzumerken ist:
+Three assignments are given. To be noted:
-* Jeder dieser Binärcodes&nbsp; $\rm C1$,&nbsp; $\rm C2$&nbsp; und&nbsp; $\rm C3$&nbsp; ist für eine spezielle Quellenstatistik ausgelegt.
+* Each of these binary codes&nbsp; $\rm C1$,&nbsp; $\rm C2$&nbsp; and&nbsp; $\rm C3$&nbsp; is designed for a specific source statistic.
-* Alle Codes sind präfixfrei und somit ohne weitere Angabe sofort decodierbar.
+* All codes are prefix-free and thus immediately decodable without further specification.
@@ Line 26: / Line 26: @@
-''Hinweis:''
+''Hint:''
-*Die Aufgabe gehört zum  Kapitel&nbsp; [[Information_Theory/Allgemeine_Beschreibung|Allgemeine Beschreibung der Quellencodierung]].
+*The task belongs to the chapter&nbsp; [[Information_Theory/Allgemeine_Beschreibung|General description of source coding]].
-===Fragebogen===
+===Questions===
 <quiz display=simple>
-{Bestimmen Sie die mittlere Codewortlänge&nbsp; $L_{\rm M}$&nbsp; für&nbsp; $p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} =  1/4$.
+{Determine the mean codeword length&nbsp; $L_{\rm M}$&nbsp; for&nbsp; $p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} =  1/4$.
 |type="{}"}
-$\text{C1:}\ \ L_{\rm M} \ = \ $  { 2 1% } $\ \rm bit/Quellensymbol$
+$\text{C1:}\ \ L_{\rm M} \ = \ $  { 2 1% } $\ \rm bit/source symbol$
-$\text{C2:}\ \ L_{\rm M} \ =  \ $  { 2.25 1% } $\ \rm bit/Quellensymbol$
+$\text{C2:}\ \ L_{\rm M} \ =  \ $  { 2.25 1% } $\ \rm bit/source symbol$
-$\text{C3:}\ \ L_{\rm M} \ =  \ $  { 2.25 1% } $\ \rm bit/Quellensymbol$
+$\text{C3:}\ \ L_{\rm M} \ =  \ $  { 2.25 1% } $\ \rm bit/source symbol$
-{Welche Werte ergeben sich für&nbsp; $p_{\rm A} = 1/2, \,  p_{\rm B} = 1/4, \, p_{\rm C} = p_{\rm D} =  1/8$?
+{Which values result for&nbsp; $p_{\rm A} = 1/2, \,  p_{\rm B} = 1/4, \, p_{\rm C} = p_{\rm D} =  1/8$?
 |type="{}"}
-$\text{C1:}\ \ L_{\rm M} \ =  \ $  { 2 1% } $\ \rm bit/Quellensymbol$
+$\text{C1:}\ \ L_{\rm M} \ =  \ $  { 2 1% } $\ \rm bit/source symbol$
-$\text{C2:}\ \ L_{\rm M} \ =  \ $  { 1.75 1% } $\ \rm bit/Quellensymbol$
+$\text{C2:}\ \ L_{\rm M} \ =  \ $  { 1.75 1% } $\ \rm bit/source symbol$
-$\text{C3:}\ \ L_{\rm M} \ =  \ $  { 2.5 1% } $\ \rm bit/Quellensymbol$
+$\text{C3:}\ \ L_{\rm M} \ =  \ $  { 2.5 1% } $\ \rm bit/source symbol$
-{Woran erkennt man präfixfreie Codes?
+{How can you recognise prefix-free codes?
 |type="[]"}
-+ Kein Codewort ist der Beginn eines anderen Codewortes.
++ No code word is the beginning of another code word.
-- Alle Codeworte haben gleiche Länge.
+- All codewords have the same length.
-{Für die spezielle Quellensymbolfolge&nbsp; $\rm ADBDCBCBADCA$&nbsp; ergibt sich die Codesymbolfolge&nbsp; $\rm 001101111001100100111000$.
+{For the special source symbol sequence&nbsp; $\rm ADBDCBCBADCA$&nbsp;, the code symbol sequence&nbsp; $\rm 001101111001100100111000$ results.
-<br>Welcher Code wurde verwendet?
+<br>Which code was used?
 |type="()"}
-+ der Code&nbsp;  $\rm C1$,
++ the code&nbsp;  $\rm C1$,
-- der Code&nbsp; $\rm C2$.
+- the code&nbsp; $\rm C2$.
-{Nach Codierung mit&nbsp; $\rm C3$&nbsp; erhält man&nbsp; $\rm 001101111001100100111000$.&nbsp; Wie lautet die zugehörige Quellensymbolfolge?
+{After coding with&nbsp; $\rm C3$&nbsp;, you get&nbsp; $\rm 001101111001100100111000$.&nbsp; What is the corresponding source symbol sequence?
 |type="()"}
 - $\rm AACDBACABADAAA$ ...
@@ Line 70: / Line 70: @@
 </quiz>
-===Musterlösung===
+===Solution===
 {{ML-Kopf}}
-'''(1)'''&nbsp; Die mittlere Codewortlänge ergibt sich allgemein zu
+'''(1)'''&nbsp; The average codeword length is generally given by
 :$$L_{\rm M} = p_{\rm A} \cdot L_{\rm A} + p_{\rm B} \cdot L_{\rm B}+ p_{\rm C} \cdot L_{\rm C} + p_{\rm D} \cdot L_{\rm D}
 \hspace{0.05cm}.$$
-Sind die vier Quellensymbole gleichwahrscheinlich&nbsp; $($alle Wahrscheinlichkeiten genau $1/4)$, so kann dafür auch geschrieben werden:
+If the four source symbols are equally probable&nbsp; $($all probabilities exactly $1/4)$, then for this we can also write:
 :$$L_{\rm M} = 1/4 \cdot ( L_{\rm A} + L_{\rm B}+ L_{\rm C} + L_{\rm D})
 \hspace{0.05cm}.$$
-* $\text{Code C1:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2.00}\ \rm bit/Quellensymbol$,
+* $\text{Code C1:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2.00}\ \rm bit/source symbol$,
-* $\text{Code C2:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2.25}\ \rm bit/Quellensymbol$
+* $\text{Code C2:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2.25}\ \rm bit/source symbol$
-* $\text{Code C3:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2.25}\ \rm bit/Quellensymbol$.
+* $\text{Code C3:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2.25}\ \rm bit/source symbol$.
-'''(2)'''&nbsp; Mit der Codetabelle&nbsp; $\text{C1}$&nbsp; ergibt sich unabhängig von den Symbolwahrscheinlichkeiten stets die mittlere Codewortlänge&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2}\ \rm bit/Quellensymbol$.
+'''(2)'''&nbsp; With the code table&nbsp; $\text{C1}$&nbsp;, the average code word length&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2}\ \rm bit/source symbol$ is always obtained, independent of the symbol probabilities.
-Für die beiden anderen Codes erhält man:
+For the other two codes one obtains:
-* $\text{Code C2:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} = 1/2 \cdot 1 + 1/4 \cdot 2 + 1/8 \cdot 3 + 1/8 \cdot 3 \hspace{0.15cm}\underline{= 1.75}\ \rm bit/Quellensymbol$,
+* $\text{Code C2:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} = 1/2 \cdot 1 + 1/4 \cdot 2 + 1/8 \cdot 3 + 1/8 \cdot 3 \hspace{0.15cm}\underline{= 1.75}\ \rm bit/source symbol$,
-* $\text{Code C3:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} = 1/2 \cdot 3 + 1/4 \cdot 2 + 1/8 \cdot 1 + 1/8 \cdot 3 \hspace{0.15cm}\underline{= 2.50}\ \rm bit/Quellensymbol$.
+* $\text{Code C3:}$&nbsp;&nbsp;&nbsp; $L_{\rm M} = 1/2 \cdot 3 + 1/4 \cdot 2 + 1/8 \cdot 1 + 1/8 \cdot 3 \hspace{0.15cm}\underline{= 2.50}\ \rm bit/source symbol$.
-Man erkennt aus dem Beispiel das Prinzip:
+You can see the principle from the example:
-*Wahrscheinliche Symbole werden durch wenige Binärsymbole dargestellt, unwahrscheinliche durch mehr.
+*Probable symbols are represented by a few binary symbols, improbable ones by more.
-*Bei gleichwahrscheinlichen Symbolen wählt man am besten auch die Codewortlängen gleich.
+*In the case of equally probable symbols, it is best to choose the same codeword lengths.
-'''(3)'''&nbsp; Richtig ist <u>Lösungsvorschlag 1</u>:
+'''(3)'''&nbsp; <u>Solution suggestion 1</u> is correct:
-*Der Code&nbsp; $\text{C1}$&nbsp; mit einheitlicher Länge aller Codeworte ist präfixfrei,
+*The code&nbsp; $\text{C1}$&nbsp; with uniform length of all codewords is prefix-free,
-*aber auch andere Codes können präfixfrei sein, zum Beispiel die Codes&nbsp;  $\text{C2}$&nbsp; und&nbsp;  $\text{C3}$.
+*but other codes can also be prefix-free, for example the codes&nbsp;  $\text{C2}$&nbsp; and&nbsp;  $\text{C3}$.
-'''(4)'''&nbsp; Richtig ist <u>Lösungsvorschlag 1</u>:
+'''(4)'''&nbsp; <u>Solution suggestion 1</u> is correct:
-*Bereits aus "00" am Anfang erkennt man, dass der Code&nbsp; $\text{C2}$&nbsp; hier nicht in Frage kommt, da sonst die Quellensymbolfolge mit "AA" beginnen müsste.
+*Already from "00" at the beginning one can see that the code&nbsp; $\text{C2}$&nbsp; is out of the question here, because otherwise the source symbol sequence would have to begin with "AA".
-*Tatsächlich wurde der Code&nbsp; $\text{C1}$&nbsp; verwendet.
+*In fact, the code&nbsp; $\text{C1}$&nbsp; was used.
-'''(5)'''&nbsp; Richtig ist der <u>Lösungsvorschlag 2</u>:
+'''(5)'''&nbsp; <u>Solution suggestion 2</u> is correct:
-*Der erste Lösungsvorschlag gibt dagegen die Quellensymbolfolge für den Code&nbsp; $\text{C2}$&nbsp; an, wenn die Codesymbolfolge&nbsp; $\rm 001101111001100100111000$&nbsp; lauten würde.
+*The first suggested solution, on the other hand, gives the source symbol sequence for code&nbsp; $\text{C2}$&nbsp; if the code symbol sequence would be &nbsp; $\rm 001101111001100100111000$&nbsp;.
 {{ML-Fuß}}