Difference between revisions of "Aufgaben:Exercise 2.2: Kraft–McMillan Inequality"
Line 3: | Line 3: | ||
}} | }} | ||
− | [[File:P_ID2416__Inf_A_2_2.png|right|frame| | + | [[File:P_ID2416__Inf_A_2_2.png|right|frame|Four examples of binary codes and in addition three ternary codes]] |
− | In | + | In the figure some exemplary binary and ternary codes are given. |
− | * | + | *In binary code $\rm B1$ all possible source symbols $q_\mu$ $($with run index $\mu = 1$, ... , $8)$ are each represented by a code symbol sequence $\langle c_\mu \rangle $ of uniform length $L_\mu = 3$ . |
− | * | + | *This code is unsuitable for data compression for this reason. |
− | + | The possibility of data compression arises only when | |
− | * | + | * the $M$ Quellensymbole nicht gleichwahrscheinlich, und |
*die Länge $L_\mu$ der Codeworte unterschiedlich sind. | *die Länge $L_\mu$ der Codeworte unterschiedlich sind. | ||
− | + | For example, the binary code $\rm B2$ has this property: | |
− | * | + | *Here, one codeword each has the length $1$, $2$ and $3$, respectively $(N_1 = N_2 = N_3 = 1)$. |
− | * | + | *Two code words have the length $L_\mu = 4$ $(N_4 = N_5 = 2)$. |
− | + | A prerequisite for the decodability of such a code is that the code is '''prefix-free''' . | |
− | * | + | *That is, no codeword may be the prefix (i.e., the beginning) of a longer codeword. |
− | * | + | *A necessary condition for a code to be prefix-free for data compression was stated by Leon Kraft in 1949, called '''Kraft's inequality''': |
:$$\sum_{\mu=1}^{M} \hspace{0.2cm} D^{-L_{\mu}} \le 1 \hspace{0.05cm}.$$ | :$$\sum_{\mu=1}^{M} \hspace{0.2cm} D^{-L_{\mu}} \le 1 \hspace{0.05cm}.$$ | ||
− | + | Here denote | |
− | * $M$ | + | * $M$ the number of possible source symbols $q_\mu$, |
− | * $L_\mu$ | + | * $L_\mu$ the length of the codeword $q_\mu$ associated with the source symbol $c_\mu$, |
− | * $D = 2$ | + | * $D = 2$ denotes a binary code $(\rm 0$ or $\rm 1)$ and $D = 3$ denotes a ternary code $(\rm 0$, $\rm 1$, $\rm 2)$. |
− | + | A code can be prefix-free only if Kraft's inequality is satisfied. The converse does not hold: if Kraft's inequality is satisfied, it does not mean that this code is actually prefix-free. | |
− | + | ''Hint:'' | |
− | '' | + | *The task belongs to the chapter [[Information_Theory/Allgemeine_Beschreibung|General description of source coding]]. |
− | * | ||
− | === | + | ===Question=== |
<quiz display=simple> | <quiz display=simple> | ||
− | { | + | {Which of the binary codes satisfy Kraft's inequality? |
|type="[]"} | |type="[]"} | ||
+ $\rm B1$, | + $\rm B1$, | ||
Line 54: | Line 53: | ||
− | { | + | {Which of the given binary codes are prefix-free? |
|type="[]"} | |type="[]"} | ||
+ $\rm B1$, | + $\rm B1$, | ||
Line 62: | Line 61: | ||
− | { | + | {Which of the given ternary codes are prefix-free? |
|type="[]"} | |type="[]"} | ||
+ $\rm T1$, | + $\rm T1$, | ||
Line 69: | Line 68: | ||
− | { | + | {What are the characteristics of the ternary code $\rm T1$? |
|type="{}"} | |type="{}"} | ||
$ N_1 \ = \ $ { 1 } | $ N_1 \ = \ $ { 1 } | ||
Line 77: | Line 76: | ||
− | { | + | {How many trivalent codewords $(L_\mu = 3)$ could be added to the $\rm T1$ code without changing the prefix freedom? |
|type="{}"} | |type="{}"} | ||
$\Delta N_3 \ = \ $ { 6 } | $\Delta N_3 \ = \ $ { 6 } | ||
− | { | + | {The ternary code $\rm T3$ is to be expanded to a total of $N = 9$ codewords. How to achieve this without violating the prefix freedom? |
|type="[]"} | |type="[]"} | ||
− | - | + | - Addition of four three-valued codewords. |
− | + | + | + Addition of four four-valued codewords. |
− | + | + | + Addition of one trivalent and three tetravalent codewords. |
Revision as of 21:26, 29 June 2021
In the figure some exemplary binary and ternary codes are given.
- In binary code $\rm B1$ all possible source symbols $q_\mu$ $($with run index $\mu = 1$, ... , $8)$ are each represented by a code symbol sequence $\langle c_\mu \rangle $ of uniform length $L_\mu = 3$ .
- This code is unsuitable for data compression for this reason.
The possibility of data compression arises only when
- the $M$ Quellensymbole nicht gleichwahrscheinlich, und
- die Länge $L_\mu$ der Codeworte unterschiedlich sind.
For example, the binary code $\rm B2$ has this property:
- Here, one codeword each has the length $1$, $2$ and $3$, respectively $(N_1 = N_2 = N_3 = 1)$.
- Two code words have the length $L_\mu = 4$ $(N_4 = N_5 = 2)$.
A prerequisite for the decodability of such a code is that the code is prefix-free .
- That is, no codeword may be the prefix (i.e., the beginning) of a longer codeword.
- A necessary condition for a code to be prefix-free for data compression was stated by Leon Kraft in 1949, called Kraft's inequality:
- $$\sum_{\mu=1}^{M} \hspace{0.2cm} D^{-L_{\mu}} \le 1 \hspace{0.05cm}.$$
Here denote
- $M$ the number of possible source symbols $q_\mu$,
- $L_\mu$ the length of the codeword $q_\mu$ associated with the source symbol $c_\mu$,
- $D = 2$ denotes a binary code $(\rm 0$ or $\rm 1)$ and $D = 3$ denotes a ternary code $(\rm 0$, $\rm 1$, $\rm 2)$.
A code can be prefix-free only if Kraft's inequality is satisfied. The converse does not hold: if Kraft's inequality is satisfied, it does not mean that this code is actually prefix-free.
Hint:
- The task belongs to the chapter General description of source coding.
Question
Musterlösung
- $\rm B1$: $8 \cdot 2^{-3} = 1$ ⇒ Bedingung erfüllt,
- $\rm B2$: $1 \cdot 2^{-1} + 1 \cdot 2^{-2} + 1 \cdot 2^{-3} + 2 \cdot 2^{-4}= 1$ ⇒ Bedingung erfüllt,
- $\rm B3$: $1 \cdot 2^{-1} + 1 \cdot 2^{-2} + 1 \cdot 2^{-3} + 2 \cdot 2^{-4}= 1$ ⇒ Bedingung erfüllt,
- $\rm B4$: $1 \cdot 2^{-1} + 1 \cdot 2^{-2} + 2 \cdot 2^{-3} + 1 \cdot 2^{-4}= 17/16$ ⇒ Bedingung nicht erfüllt.
(2) Richtig sind die Lösungsvorschläge 1 und 2:
- Der Code $\rm B4$, der die Kraftsche Ungleichung nicht erfüllt, ist mit Sicherheit auch nicht präfixfrei.
- Aber bei Erfüllung der Kraftschen Ungleichung ist noch nicht sicher, dass dieser Code auch präfixfrei ist.
- Beim Code $\rm B3$ ist "10" der Beginn des Codewortes "1011".
- Dagegen sind die Codes $\rm B1$ und $\rm B2$ tatsächlich präfixfrei.
(3) Richtig sind die Antworten 1 und 3:
- Die Kraftsche Ungleichung wird von allen drei Codes erfüllt.
- Wie aus der Tabelle hervorgeht, sind die Codes $\rm T1$ und $\rm T3$ tatsächlich präfixfrei.
- Der Code $\rm T2$ ist dagegen nicht präfixfrei, da "1" der Beginn des Codewortes "10" ist.
(4) $N_i$ gibt an, wieviele Codeworte mit $i$ Symbolen es im Code gibt. Für den Code $\rm T1$ gilt:
- $$N_1 \hspace{0.15cm}\underline{= 1}\hspace{0.05cm}, \hspace{0.2cm}N_2 \hspace{0.15cm}\underline{= 2}\hspace{0.05cm}, \hspace{0.2cm}N_3 \hspace{0.15cm}\underline{= 6}\hspace{0.05cm}.$$
(5) Nach der Kraftschen Ungleichung muss gelten
- $$N_1 \cdot 3^{-1} + N_2 \cdot 3^{-2} + N_3 \cdot 3^{-3 } \le 1\hspace{0.05cm}.$$
Bei gegebenem $N_1 = 1$ und $N_2 = 2$ wird dies erfüllt, solange gilt:
- $$N_3 \cdot 3^{-3 } \le 1 - 1/3 - 2/9 = 4/9 \hspace{0.3cm} \Rightarrow \hspace{0.3cm}N_3 \le 12 \hspace{0.3cm} \Rightarrow \hspace{0.3cm} {\rm \Delta}\,N_3 \hspace{0.15cm}\underline{= 6}\hspace{0.05cm}.$$
Die zusätzlichen Codeworte sind $\rm 210, \,211, \,212, \,220, \,221, \,222$.
(6) Für den Code $\rm T3$ gilt:
- $S({\rm T3})= 2 \cdot 3^{-1} + 2 \cdot 3^{-2} + 1 \cdot 3^{-3 } = {25}/{27}\hspace{0.05cm}.$
- Wegen $S({\rm T3}) \le 1$ erfüllt der Ternärcode $\rm T3$ die Kraftsche Ungleichung und er ist zudem auch präfixfrei.
Betrachten wir nun die vorgeschlagenen neuen Codes.
- Code $\rm T4$ $(N_1 = 2, \ N_2 = 2, \ N_3 = 5)$:
- $$S({\rm T4})= S({\rm T3}) + 4 \cdot 3^{-3 } = {29}/{27}\hspace{0.1cm} > \hspace{0.1cm}1\hspace{0.3cm} \Rightarrow \hspace{0.3cm} {\rm T4 \hspace{0.15cm}ist\hspace{0.15cm} ungeeignet}\hspace{0.05cm},$$
- Code $\rm T5$ $(N_1 = 2, \ N_2 = 2, \ N_3 = 1, \ N_4 = 4)$:
- $$S({\rm T5})= S({\rm T3}) + 4 \cdot 3^{-4 } = {79}/{81}\hspace{0.1cm} < \hspace{0.1cm}1\hspace{0.3cm} \Rightarrow \hspace{0.3cm} {\rm T5 \hspace{0.15cm}ist\hspace{0.15cm} geeignet}\hspace{0.05cm},$$
- Code $\rm T6$ $(N_1 = 2, \ N_2 = 2, \ N_3 = 2, \ N_4 = 3)$:
- $$S({\rm T6})= S({\rm T3}) + 1 \cdot 3^{-3 } + 3 \cdot 3^{-4 } = \frac{75 + 3 + 3}{81}\hspace{0.1cm} = \hspace{0.1cm}1\hspace{0.3cm} \Rightarrow \hspace{0.3cm} {\rm T6 \hspace{0.15cm}ist\hspace{0.15cm} geeignet}\hspace{0.05cm}.$$
Richtig sind also die zwei letzten Lösungsvorschläge. Beispielsweise lauten die insgesamt $N = 9$ Codeworte des präfixfreien Codes $\rm T6$:
- $$\rm 0, \, 1, \, 20, \,21, \,220, \,221, \,2220, \, 2221 , \,2222.$$