Difference between revisions of "Aufgaben:Exercise 2.4: About the LZW Algorithm"

From LNTwww
 
(32 intermediate revisions by 4 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Komprimierung nach Lempel, Ziv und Welch
+
{{quiz-Header|Buchseite=Information_Theory/Compression_According_to_Lempel,_Ziv_and_Welch
 
}}
 
}}
  
[[File:P_ID2440__Inf_A_2_4.png|right|]]
+
[[File:EN_Inf_A_2_4.png|right|frame|LZW dictionary for Encoding/Decoding]]
:Der Komprimierungsalgorithmus LZW - benannt nach den Erfindern Abraham Lempel, Jacob Ziv und Terry Welch- arbeitet ebenso wie LZ78 mit einem globalen Wörterbuch. Dieses ist hier zu Beginn mit allen möglichen Zeichen &ndash; im Beispiel <b>A</b>, <b>B</b>, <b>C</b> und <b>D</b> &ndash; vorbelegt und wird während der Codierung sukzessive erweitert.
+
The compression algorithm&nbsp; "LZW"&nbsp; &ndash; named after the inventors&nbsp;  [https://en.wikipedia.org/wiki/Abraham_Lempel Abraham Lempel],&nbsp; [https://en.wikipedia.org/wiki/Jacob_Ziv Jacob Ziv]&nbsp;  and&nbsp;  [https://en.wikipedia.org/wiki/Terry_Welch Terry Welch]&nbsp; &ndash; works like&nbsp; "LZ78"&nbsp; with a global dictionary. &nbsp; In this case, all possible characters &ndash; in the example&nbsp; <b>A</b>,&nbsp; <b>B</b>,&nbsp; <b>C</b>,&nbsp; <b>D</b>&nbsp; &ndash; are preassigned at the beginning and are successively expanded during encoding.
  
:Bei der Decodierung entsteht genau das gleiche Wörterbuch, nur erfolgt der gleiche Eintrag mit dem Index <i>I</i> einen Schritt später als während der Codierung. Zur Bezeichnung der Decodierschritte verwenden wir hier die Laufvariable <i>i</i>.
+
During decoding, exactly the same dictionary is created, only the same entry with index&nbsp; $I$&nbsp; is made one step later than during encoding.&nbsp; To designate the decoding steps, we use the variable&nbsp; $i$.
  
:Hier noch einige Hinweise zur LZW&ndash;Codierung und &ndash;Decodierung:
+
Here are some more notes on LZW encoding and decoding:
 +
*During encoding, at each time&nbsp; $i$,&nbsp; a search is made in the dictionary for the longest possible character string that matches the input string currently present.
 +
*The dictionary index&nbsp; $I_i$&nbsp; found is always transmitted in binary form.&nbsp; At the same time,&nbsp; $W(I_i) + Z$&nbsp; is entered into the dictionary under the next free index.
 +
*Here,&nbsp; $W(I_i)$&nbsp; denotes a character or a character string, and&nbsp; $Z$&nbsp; is the first character of the upcoming input string (i.e. also a character), which is no longer considered in&nbsp; $W(I_i)$&nbsp;.
 +
*With&nbsp; $M = 4$&nbsp; possible characters, the first index&nbsp; $I_1$&nbsp; is transmitted with two bits, the indices&nbsp; $I_2$,&nbsp;... ,&nbsp;$I_5$&nbsp; with three bits, the next eight indices with four bits, then 16 indices with five bits, etc. The reason for this bit-saving measure can be found in the solution to this task.
  
:<ul class="liste_ohne"><li>Bei der Codierung wird zu jedem Zeitpunkt <i>i</i> im Wörterbuch nach einer möglichst langen Zeichenkette gesucht, die mit dem aktuell anliegenden Eingabe&ndash;String übereinstimmt.
 
  
:<ul class="liste_ohne"><li>Der gefundene Wörterbuchindex <i>I<sub>i</sub></i> wird stets in Binärform übertragen. Gleichzeitig wird ins Wörterbuch unter dem nächsten freien Index <i>W</i>(<i>I<sub>i</sub></i>) + <i>Z</i> eingetragen.
+
After coding in the manner described here over&nbsp; $16$&nbsp; coding steps, the following binary sequence of length&nbsp; $N_{\rm bit} = 61$:
 +
:$$\boldsymbol{\rm 00 \ 000 \ 001 \ 010 \ 100 \ 0001 \ 1000 \ 0111 \ 0001 \ 0001 \ 0011 \ 1011 \ 1011 \ 01101 \ 00011 \  01001}\hspace{0.05cm}.$$
  
:<ul class="liste_ohne"><li>Hierbei bezeichner <i>W</i>(<i>I<sub>i</sub></i>) ein Zeichen oder eine Zeichenfolge, und <i>Z</i> ist das erste Zeichen des anstehenden Eingabe&ndash;Strings (also ebenfalls ein Character), das in <i>W</i>(<i>I<sub>i</sub></i>) nicht mehr berücksichtigt ist.
+
The task of the decoder&nbsp; (more precisely:&nbsp; your task)&nbsp; is now,
 +
* to reconstruct from this binary sequence the indices&nbsp; $I_1$, ... , $I_{16}$&nbsp; taking into account the different number of bits&nbsp; (description of the 16 decoding results by decimal numbers),
 +
* then read out the corresponding characters or character sequences from the dictionary according to the indices and finally &ndash; with one step delay &ndash; generate the new dictionary entry.
  
:<ul class="liste_ohne"><li>Bei <i>M</i> = 4 möglichen Zeichen wird der erste Index <i>I</i><sub>1</sub> mit 2 Bit übertragen, die Indizes <i>I</i><sub>2</sub>,&nbsp;...,&nbsp;<i>I</i><sub>5</sub> mit 3 Bit, die nächsten 8 Indizes mit 4 Bit, danach 16 Indizes mit 5 Bit usw. Die Begründung für diese bitsparende Maßnahme finden Sie in der Musterlösung zur Aufgabe.
 
  
:Nach der Codierung in der hier beschriebenen Art und Weise über 16 Codierschritte ergibt sich die folgende Binärfolge der Länge <i>N</i><sub>Bit</sub> = 61:
 
:$$\boldsymbol{\rm 00 000 001 010 100 0001 1000 0111 0001 0001 0011 1011 1011 01101 00011 01001}\hspace{0.05cm}.$$
 
:Aufgabe des Decoders (genauer gesagt: Ihre Aufgabe) ist es nun,
 
  
:* aus dieser Binärsequenz die Indizes <i>I</i><sub>1</sub>, ... , <i>I</i><sub>16</sub> zu rekonstruieren, wobei die unterschiedliche Bitanzahl zu berücksichtigen ist (Beschreibung der 16 Decodierergebnisse durch Dezimalzahlen),
 
  
:* anschließend aus dem Wörterbuch entsprechend den Indizes die zugehörigen Zeichen bzw. Zeichenfolgen auszulesen und schließlich &ndash; mit einem Schritt Verzögerung &ndash; den neuen Wörterbucheintrag zu generieren.
 
  
:<b>Hinweis:</b> Die Aufgabe bezieht sich auf das Kapitel 2.2.
 
  
  
===Fragebogen===
+
 
 +
Hints:
 +
*The exercise belongs to the chapter&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch|Compression according to Lempel, Ziv and Welch]].
 +
*In particular, reference is made to the pages&nbsp;
 +
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#LZ77_-_the_basic_form_of_the_Lempel-Ziv_algorithms|LZ77 - the basic form of Lempel-Ziv algorithms]],
 +
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Lempel-Ziv_coding_with_variable_index_bit_length|Lempel-Ziv coding with variable index bit length]],
 +
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Decoding_of_the_LZW_algorithm|Decoding of the LZW algorithm]].
 +
 +
*&nbsp; [[Aufgaben:Exercise_2.3:_About_the_LZ78_Compression|Exercise 2.3]]&nbsp; and&nbsp; [[Aufgaben:Exercise_2.3Z:_About_the_LZ77_Coding|Exercise 2.3Z]]&nbsp; deal with other Lempel-Ziv methods in a similar way.
 +
 
 +
 
 +
 
 +
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
{Welche Indizes stehen zu den ersten vier Schritten zur Decodierung an? Geben Sie alle <i>I</i> als Dezimalzahlen ein.
+
{Which indices are available for the first four steps&nbsp; $(i=1$, ... , $i=4)$&nbsp; for decoding.&nbsp; Enter all&nbsp; $I_i = I(i)$&nbsp; as decimal numbers.
 
|type="{}"}
 
|type="{}"}
$i = 1:\ I_1$ = { 0 3% }
+
$I_1 \ = \ $ { 0. }
$i = 2:\ I_2$ = { 0 3% }
+
$I_2 \ = \ $ { 0. }
$i = 3:\ I_3$ = { 1 3% }
+
$I_3 \ = \ $ { 1 }
$i = 4:\ I_4$ = { 2 3% }
+
$I_4 \ = \ $ { 2 }
  
  
{Setzen Sie die Unterteilung des Decoder&ndash;Eingabestrings bis zum Ende fort. Welche Indizes ergeben sich bei den aufgeführten Decodierschritten?
+
{Continue the subdivision of the decoder input string to the end.&nbsp; Which indices result from the decoding steps&nbsp; $i=5$, ... , $i=8$?
 
|type="{}"}
 
|type="{}"}
$i = 5:\ I_5$ = { 4 3% }
+
$I_5 \ = \ $ { 4 }
$i = 6:\ I_6$ = { 1 3% }
+
$I_6 \ = \ $ { 1 }
$i = 7:\ I_7$ = { 8 3% }
+
$I_7 \ = \ $ { 8 }
$i = 8:\ I_8$ = { 7 3% }
+
$I_8 \ = \ $ { 7 }
  
  
{Wie lautet der Ausgabe&ndash;String nach <i>i</i> = 16 Decodierschritten?
+
 
|type="[]"}
+
{What is the output string after&nbsp; $i = 16$&nbsp; decoding steps?
 +
|type="()"}
 
- <b>AABCAABAABCAABCABBDBBDCDBA</b>,
 
- <b>AABCAABAABCAABCABBDBBDCDBA</b>,
 
+ <b>AABCAABAABCABBDCABCABBDDBA</b>,
 
+ <b>AABCAABAABCABBDCABCABBDDBA</b>,
Line 55: Line 67:
  
  
{Der nächste Index lautet: <i>I</i><sub>17</sub> = 10010 (binär) = 18 (dezimal). Welche der folgenden Aussagen treffen zu?
+
{The next index is: &nbsp; $I_{17} = 10010$ (binary) $= 18$ (decimal).&nbsp; Which of the following statements are true?
 
|type="[]"}
 
|type="[]"}
+ Die Decoderausgabe zum Schritt <i>i</i> = 17 lautet <b>DB</b>.
+
+ The decoder output to step&nbsp; $i = 17$&nbsp; is&nbsp; "<b>DB</b>".
- Die Decoderausgabe zum Schritt <i>i</i> = 17 lautet <b>BAD</b>.
+
- The decoder output to step&nbsp; $i = 17$&nbsp; is&nbsp; "<b>BAD</b>".
- Neuer Wörterbucheintrag <b>DB</b> in die Zeile <i>I</i> = 19.
+
- New dictionary entry&nbsp; "<b>DB</b>"&nbsp; in line&nbsp; $I = 19$.
+ Neuer Wörterbucheintrag <b>BAD</b> in die Zeile <i>I</i> = 19.
+
+ New dictionary entry&nbsp; "<b>BAD</b>"&nbsp; in line&nbsp; $I = 19$.
  
  
{Der Decoder&ndash;Eingabestring besteht aus <i>N</i><sub>Bit</sub> = 300 Binärzeichen (Bit). Welche der folgenden Aussagen sind dann sicher zutreffend?
+
{Let the decoder input string consist of&nbsp; $N_{\rm bit} = 300$&nbsp; binary characters (bits).&nbsp; Which of the following statements is then certainly true?
 
|type="[]"}
 
|type="[]"}
+ Übertragen wurden 58 LZW&ndash;Phrasen mit variabler Bitlänge.
+
+ $58$&nbsp; LZW phrases with variable bit length were transmitted.
- Mit einheitlicher Indexbitlänge wären weniger Bit erforderlich.
+
- With a uniform index bit length, fewer bits would be required.
- Übertragen wurde eine Datei mit <i>N</i>&nbsp;=&nbsp;150 Quaternärzeichen.
+
- A file with&nbsp; $N =150$&nbsp; quaternary characters was transmitted.
  
  
Line 73: Line 85:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
:<b>1.</b>&nbsp;&nbsp;Bei der LZW&ndash;Codierung des ersten Zeichens (<i>i</i> = 1) sind nur die vier Indizes <i>I</i> = 0, ... , <i>I</i> = 3 möglich, für die bereits zum Schritt <i>i</i> = 0 (Vorbelegung) ein Wörterbucheintrag vorgenommen wurde. Deshalb genügt es hier, den Index mit zwei Bit zu übertragen.
+
'''(1)'''&nbsp; For the LZW coding of the first character&nbsp; $(i = 1)$,&nbsp; only the four indices&nbsp; $I = 0$, ... , $I = 3$&nbsp; are possible for which a dictionary entry has already been made at step&nbsp; $i = 0$&nbsp; (preassignment).&nbsp; Therefore, it is sufficient here to transmit the index with two bits.
  
:Bereits zum Schritt <i>i</i> = 2 werden dagegen drei Bit benötigt. Hätte die Eingangsfolge mit <b>AAA</b> begonnen, so hätte die LZW&ndash;Codierung <i>I</i><sub>2</sub> = <i>I</i>(<i>i</i> = 2) den Dezimalwert 4 ergeben. Dieser ist nicht mehr mit zwei Bit darstellbar und muss mit drei Bit codiert werden, ebenso wie <i>I</i><sub>3</sub>, <i>I</i><sub>4</sub> und <i>I</i><sub>5</sub>.
+
*In contrast, three bits are already required for step&nbsp; $i = 2$.&nbsp; If the input sequence had started with&nbsp; <b>AAA</b>&nbsp;, the LZW coding&nbsp; $I_2 = I(i = 2)$&nbsp; would have resulted in the decimal value&nbsp; $4$.&nbsp; This can no longer be represented with two bits and must be coded with three bits, just like&nbsp; $I_3$,&nbsp; $I_4$&nbsp; and&nbsp; $I_5$.
  
:Der vorgegebene Eingabestring
+
*The given input string:$$\boldsymbol{\rm 00 000 001 010 100 0001}\hspace{0.05cm}...$$
:$$\boldsymbol{\rm 00 000 001 010 100 0001}\hspace{0.05cm}...$$
+
:must therefore be divided by the decoder as follows:
:ist deshalb vom Decoder wie folgt aufzuteilen:
+
:$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001|}\hspace{0.05cm}...$$
:$$\boldsymbol{\rm 00 | 000 |001 |010 |100 |0001|}\hspace{0.05cm}...$$
 
:Die Decoderergebnisse der ersten vier Schritte lauten somit:
 
  
:* <u><i>i</i> = 1:</u>&nbsp;&nbsp; <i>I</i> = <b>00</b> (binär) &nbsp;&nbsp; &nbsp;<u> = 0</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>A</b>,
+
*The decoder results of the first four steps are therefore:
 +
:* $I_1 =$ <b>00</b> (binary) &nbsp;&nbsp; &nbsp;<u> = 0</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>A</b>,
 +
:* $I_2 =$ <b>000</b> (binary) &nbsp;&nbsp;<u> = 0</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>A</b>,
 +
:* $I_3 =$ <b>001</b> (binary) &nbsp;&nbsp;<u> = 1</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>,
 +
:* $I_4 =$ <b>010</b> (binary) &nbsp;&nbsp;<u> = 2</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>C</b>.
  
:* <u><i>i</i> = 2:</u>&nbsp;&nbsp; <i>I</i> = <b>000</b> (binär) &nbsp;&nbsp;<u> = 0</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>A</b>,
 
  
:* <u><i>i</i> = 3:</u>&nbsp;&nbsp; <i>I</i> = <b>001</b> (binär) &nbsp;&nbsp;<u> = 1</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>,
+
'''(2)'''&nbsp; If one takes into account that
 +
* for&nbsp; $i = 1$&nbsp; two bits are used,
 +
* for&nbsp; $2 &#8804; i &#8804; 5$&nbsp; three bits are used,
 +
* for&nbsp; $6 &#8804; i &#8804; 19$&nbsp; four bits are used,
 +
* for&nbsp; $14 &#8804; i &#8804; 29$&nbsp; five bits are used,
  
:* <u><i>i</i> = 4:</u>&nbsp;&nbsp; <i>I</i> = <b>010</b> (binär) &nbsp;&nbsp;<u> = 2</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>C</b>.
 
  
:<b>2.</b>&nbsp;&nbsp;Berücksichtigt man, dass
+
we get from the&nbsp; "continuous decoder input string"
 
+
:$$\boldsymbol{\rm 00 000 001 010 100 0001 1000 0111 0001 0001 0011 1011 1011 01101 00011 01001}$$
:* für <i>i</i> = 1 zwei Bit verwendet werden,
 
  
:* für 2 &#8804; <i>i</i> &#8804; 5 drei Bit,
+
to the&nbsp; "divided decoder input string"&nbsp; $($designated&nbsp; $I_1$, ... ,&nbsp; $I_{16})$:
 
+
:$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001 |1000 |0111 | 0001 |0001 |0011  |1011 |1011 |01101 |00011 |01001}
:* für 6 &#8804; <i>i</i> &#8804; 19 vier Bit,
 
 
 
:* für 14 &#8804; <i>i</i> &#8804; 29 fünf Bit,
 
 
 
:so kommt man vom &bdquo;kontinuierlichen Decoder&ndash;Eingangsstring&rdquo;
 
:$$\boldsymbol{\rm 00 000 001 010 100 0001 1000 0111 0001 0001 0011 1011 1011 01101 00011 01001}$$
 
:zum &bdquo;eingeteilten Decoder&ndash;Eingabestring&rdquo; (erste Zeile <i>I</i><sub>1</sub>, ... , <i>I</i><sub>8</sub>, in der zweiten Zeile <i>I</i><sub>9</sub>, ... , <i>I</i><sub>16</sub>):
 
:$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001 |1000 |0111 | }$$
 
:$$\boldsymbol{\rm 0001 |0001 |0011  |1011 |1011 |01101 |00011 |01001}
 
 
\hspace{0.1cm}.$$
 
\hspace{0.1cm}.$$
:Damit lauten die gesuchten Ergebnisse für die Decodierschritte <i>i</i> = 5, ... , <i>i</i> = 8:
 
  
:* <u><i>i</i> = 5</u>:&nbsp;&nbsp; <i>I</i> = <b>100</b> (binär) &nbsp; &nbsp;&nbsp;<u> = 4</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>AA</b>,
+
Thus the results sought for the decoding steps&nbsp; $i = 5$, ... ,&nbsp; $i = 8$:
 +
* $I_5 =$ <b>100</b> (binary) &nbsp; &nbsp;&nbsp;<u> = 4</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>AA</b>,
 +
* $I_6 =$ <b>0001</b> (binary) &nbsp;&nbsp;<u> = 1</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>,
 +
* $I_7 =$ <b>1000</b> (binary) &nbsp;&nbsp;<u> = 8</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>AAB</b>,
 +
* $I_8 =$ <b>0111</b> (binary) &nbsp;&nbsp;<u> = 7</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>CA</b>.
  
:* <u><i>i</i> = 6</u>:&nbsp;&nbsp; <i>I</i> = <b>0001</b> (binär) &nbsp;&nbsp;<u> = 1</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>,
 
  
:* <u><i>i</i> = 7</u>:&nbsp;&nbsp; <i>I</i> = <b>1000</b> (binär) &nbsp;&nbsp;<u> = 8</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>AAB</b>,
+
'''(3)'''&nbsp;<u>Statement 2</u> is correct.
 +
*The following decoding results are obtained (in abbreviated form):
  
:* <u><i>i</i> = 8:</u>&nbsp;&nbsp; <i>I</i> = <b>0111</b> (binär) &nbsp;&nbsp;<u> = 7</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>CA</b>.
+
::&nbsp;&nbsp;$I_{9} = 1$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>, $\hspace{0.5cm}I_{10} = 1$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp;<b>B</b>,$\hspace{0.5cm}I_{11} = 3$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>D</b>,$\hspace{0.5cm}I_{12} = 11$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>CAB</b>,
  
:<b>3.</b>&nbsp;&nbsp;Richtig ist <u>Aussage 2</u>. Man erhält folgende Decodierergebnisse (in verkürzter Form):
+
::&nbsp;&nbsp;$I_{13} = 11$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>CAB</b>,$\hspace{0.5cm}I_{14} = 13$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>BD</b>,$\hspace{0.5cm}I_{15} = 3$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>D</b>,$\hspace{0.5cm}I_{16} = 9$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>BA</b>.
  
:<i>I</i><sub>9</sub> = 1 &nbsp;&nbsp;&#8658;&nbsp;&nbsp <b>B</b>,&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <i>I</i><sub>10</sub> = 1 &nbsp;&nbsp;&#8658;&nbsp;&nbsp;<b>B</b>,&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; <i>I</i><sub>11</sub> = 3 &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>D</b>,&nbsp;&nbsp;&nbsp;&nbsp; <i>I</i><sub>12</sub> = 11 &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>CAB</b>,
 
  
:<i>I</i><sub>13</sub> = 11 &nbsp;&nbsp;&#8658;&nbsp;&nbsp <b>CAB</b>,&nbsp;&nbsp;&nbsp;&nbsp; <i>I</i><sub>14</sub> = 13 &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>BD</b>,&nbsp;&nbsp;&nbsp;&nbsp; <i>I</i><sub>15</sub> = 3 &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>D</b>,&nbsp;&nbsp;&nbsp;&nbsp; <i>I</i><sub>16</sub> = 9 &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>BA</b>.
 
  
:<b>4.</b>&nbsp;&nbsp;Richtig sind die <u>Aussagen 1 und 4</u>, weil:
+
'''(4)'''&nbsp; Correct are the<u>statements 1 and 4</u>.&nbsp; Reason:
  
:* Der neu ankommende Index ist 18 (dezimal) und im Wörterbuch wird unter dem Index <i>I</i> = 18 der Eintrag <b>DB</b> gefunden.
+
* The new incoming index is&nbsp; $18$&nbsp; (decimal) and in the dictionary the entry&nbsp; "<b>DB</b>"&nbsp; is found under the index&nbsp; $I = 18$.
 +
* For the decoding step&nbsp; $i = 17$&nbsp;, the line&nbsp; $I = 19$&nbsp; is entered into the dictionary.
 +
* The entry&nbsp; "<b>BAD</b>"&nbsp; is composed of the decoding result from step&nbsp; $i = 16$&nbsp; $($<b>BA</b>$)$&nbsp; and the first character&nbsp; $($<b>D</b>$)$&nbsp; of the new result&nbsp; "<b>DB</b>".
  
:* Zum Decodierschritt <i>i</i> = 17 wird in das Wörterbuch die Zeile <i>I</i> = 19 eingetragen. Der Eintrag <b>BAD</b> setzt sich zusammen aus dem Decodierergebnis aus Schritt <i>i</i> = 16 (<b>BA</b>) und dem ersten Zeichen (<b>D</b>) des neuen Ergebnisses <b>DB</b>.
 
  
:<b>5.</b>&nbsp;&nbsp;Richtig ist hier nur die <u>Aussage 1</u>:
 
  
:* Für die erste Phrase genügen zwei Bit.
+
'''(5)'''&nbsp; Only <u>statement 1</u> is correct here:
 +
* Two bits are sufficient for the first phrase.
 +
* For the phrases&nbsp; $I_2$, ... , $I_5$&nbsp; three bits are needed.
 +
* The phrases&nbsp; $I_6$, ... , $I_{13}$&nbsp; require four bits.
 +
* The phrases&nbsp; $I_{14}$, ... , $I_{29}$&nbsp; require five bits.
 +
* The phrases&nbsp; $I_{30}$, ... , $I_{58}$&nbsp; require six bits.
  
:* Für die Phrasen <i>I</i><sub>2</sub>, ... , <i>I</i><sub>5</sub> benötigt man drei Bit.
+
*This gives the total number of bits:
 +
:$$N_{\rm bits} = 1 \cdot 2 + 4 \cdot 3 + 8 \cdot 4 + 16 \cdot 5 + 29 \cdot 6 = 300 \hspace{0.05cm}.$$
  
:* Für die Phrasen <i>I</i><sub>6</sub>, ... , <i>I</i><sub>13</sub> benötigt man vier Bit.
+
*With a uniform bit length (six bits for each index),&nbsp; $58 &middot; 6 = 348$&nbsp; bits (i.e. more) would be required &nbsp; &#8658; &nbsp; statement 2 is wrong in principle.
  
:* Für die Phrasen <i>I</i><sub>14</sub>, ... , <i>I</i><sub>29</sub> benötigt man fünf Bit.
 
  
:* Für die Phrasen <i>I</i><sub>30</sub>, ... , <i>I</i><sub>58</sub> benötigt man sechs Bit.
+
Now to the third statement, which is also not true:
 
+
* With the uncoded transmission of&nbsp; $N = 150$&nbsp; characters from the symbol set&nbsp; $\{$ <b>A</b>,&nbsp; <b>B</b>,&nbsp; <b>C</b>,&nbsp; <b>D</b> $\}$&nbsp; one would need exactly&nbsp; $300$&nbsp; bits.&nbsp; With LZW one certainly needs more bits if the source is redundancy-free &nbsp; &rArr; &nbsp; characters equally likely and statistically independent.
:Damit erhält man für die gesamte Bitanzahl:
+
* With a redundant source with (e.g.)&nbsp; $H = 1.6$&nbsp; bit/source symbol, one can limit the number of bits to&nbsp; $N_{\rm bits} = N \cdot H$,&nbsp;  provided it is a very large file&nbsp; $(N \to \infty)$.
:$$N_{\rm Bit} = 1 \cdot 2 + 4 \cdot 3 + 8 \cdot 4 + 16 \cdot 5 + 29 \cdot 6 = 300 \hspace{0.05cm}.$$
+
* With a rather small file &ndash; as here with only&nbsp; $N = 150$&nbsp; bits &ndash; it is not possible to say whether &nbsp; $N_{\rm bits}$&nbsp; will be less than, equal to or greater than&nbsp; $150 &middot; 1.6 = 240$.
:Mit einheitlicher Bitlänge (6 Bit für jeden Index) wären 58 &middot; 6 = 348 Bit (also mehr) erforderlich &#8658; Aussage 2 ist prinzipiell falsch. Nun zur Aussage 3:
+
*Also quite possible is&nbsp; $N_{\rm bits} > 300$.&nbsp; In that case, it is better to do without the&nbsp; "Lempel&ndash;Ziv compression".
 
+
{{ML-Fuß}}
:* Bei der uncodierten Übertragung von <i>N</i> = 150 Zeichen aus der Symbolmenge {<b>A</b>, <b>B</b>, <b>C</b>,  <b>D</b>} würde man genau 300 Bit benötigen. Mit LZW benötigt man sicher mehr Bit, wenn die Quelle redundanzfrei ist (Zeichen gleichwahrscheinlich und statistisch unabhängig).
 
 
 
:* Bei redundanter Quelle mit (beispielsweise) <i>H</i> = 1.6 Bit/Quellensymbol kann man die Bitanzahl auf <i>N</i><sub>Bit</sub>&nbsp;=&nbsp;<i>N</i>&nbsp;&middot;&nbsp;<i>H</i> begrenzen, vorausgesetzt, es handelt sich um eine sehr große Datei (<i>N</i> &#8594; &#8734;).
 
  
:* Bei einer eher kleinen Datei &ndash; wie hier lediglich mit <i>N</i> = 150 Bit &ndash; ist keine Aussage möglich, ob die Bitanzahl <i>N</i><sub>Bit</sub> kleiner, gleich oder größer als 150 &middot; 1.6 = 240 sein wird. Auch <i>N</i><sub>Bit</sub> > 300 ist durchaus möglich. Dann sollte man auf die &bdquo;LZ&ndash;Komprimierung&rdquo; besser verzichten.
 
{{ML-Fuß}}
 
  
  
  
[[Category:Aufgaben zu Informationstheorie|^2.2 Komprimierung nach Lempel, Ziv und Welch^]]
+
[[Category:Information Theory: Exercises|^2.2 Lempel-Ziv-Welch Compression^]]

Latest revision as of 13:11, 10 August 2021

LZW dictionary for Encoding/Decoding

The compression algorithm  "LZW"  – named after the inventors  Abraham LempelJacob Ziv  and  Terry Welch  – works like  "LZ78"  with a global dictionary.   In this case, all possible characters – in the example  ABCD  – are preassigned at the beginning and are successively expanded during encoding.

During decoding, exactly the same dictionary is created, only the same entry with index  $I$  is made one step later than during encoding.  To designate the decoding steps, we use the variable  $i$.

Here are some more notes on LZW encoding and decoding:

  • During encoding, at each time  $i$,  a search is made in the dictionary for the longest possible character string that matches the input string currently present.
  • The dictionary index  $I_i$  found is always transmitted in binary form.  At the same time,  $W(I_i) + Z$  is entered into the dictionary under the next free index.
  • Here,  $W(I_i)$  denotes a character or a character string, and  $Z$  is the first character of the upcoming input string (i.e. also a character), which is no longer considered in  $W(I_i)$ .
  • With  $M = 4$  possible characters, the first index  $I_1$  is transmitted with two bits, the indices  $I_2$, ... , $I_5$  with three bits, the next eight indices with four bits, then 16 indices with five bits, etc. The reason for this bit-saving measure can be found in the solution to this task.


After coding in the manner described here over  $16$  coding steps, the following binary sequence of length  $N_{\rm bit} = 61$:

$$\boldsymbol{\rm 00 \ 000 \ 001 \ 010 \ 100 \ 0001 \ 1000 \ 0111 \ 0001 \ 0001 \ 0011 \ 1011 \ 1011 \ 01101 \ 00011 \ 01001}\hspace{0.05cm}.$$

The task of the decoder  (more precisely:  your task)  is now,

  • to reconstruct from this binary sequence the indices  $I_1$, ... , $I_{16}$  taking into account the different number of bits  (description of the 16 decoding results by decimal numbers),
  • then read out the corresponding characters or character sequences from the dictionary according to the indices and finally – with one step delay – generate the new dictionary entry.





Hints:

LZ77 - the basic form of Lempel-Ziv algorithms,
Lempel-Ziv coding with variable index bit length,
Decoding of the LZW algorithm.


Questions

1

Which indices are available for the first four steps  $(i=1$, ... , $i=4)$  for decoding.  Enter all  $I_i = I(i)$  as decimal numbers.

$I_1 \ = \ $

$I_2 \ = \ $

$I_3 \ = \ $

$I_4 \ = \ $

2

Continue the subdivision of the decoder input string to the end.  Which indices result from the decoding steps  $i=5$, ... , $i=8$?

$I_5 \ = \ $

$I_6 \ = \ $

$I_7 \ = \ $

$I_8 \ = \ $

3

What is the output string after  $i = 16$  decoding steps?

AABCAABAABCAABCABBDBBDCDBA,
AABCAABAABCABBDCABCABBDDBA,
AABCAABAABCABDBBABCCDBAABDCDBA.

4

The next index is:   $I_{17} = 10010$ (binary) $= 18$ (decimal).  Which of the following statements are true?

The decoder output to step  $i = 17$  is  "DB".
The decoder output to step  $i = 17$  is  "BAD".
New dictionary entry  "DB"  in line  $I = 19$.
New dictionary entry  "BAD"  in line  $I = 19$.

5

Let the decoder input string consist of  $N_{\rm bit} = 300$  binary characters (bits).  Which of the following statements is then certainly true?

$58$  LZW phrases with variable bit length were transmitted.
With a uniform index bit length, fewer bits would be required.
A file with  $N =150$  quaternary characters was transmitted.


Solution

(1)  For the LZW coding of the first character  $(i = 1)$,  only the four indices  $I = 0$, ... , $I = 3$  are possible for which a dictionary entry has already been made at step  $i = 0$  (preassignment).  Therefore, it is sufficient here to transmit the index with two bits.

  • In contrast, three bits are already required for step  $i = 2$.  If the input sequence had started with  AAA , the LZW coding  $I_2 = I(i = 2)$  would have resulted in the decimal value  $4$.  This can no longer be represented with two bits and must be coded with three bits, just like  $I_3$,  $I_4$  and  $I_5$.
  • The given input string:$$\boldsymbol{\rm 00 000 001 010 100 0001}\hspace{0.05cm}...$$
must therefore be divided by the decoder as follows:
$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001|}\hspace{0.05cm}...$$
  • The decoder results of the first four steps are therefore:
  • $I_1 =$ 00 (binary)      = 0 (decimal)   ⇒   A,
  • $I_2 =$ 000 (binary)    = 0 (decimal)   ⇒   A,
  • $I_3 =$ 001 (binary)    = 1 (decimal)   ⇒   B,
  • $I_4 =$ 010 (binary)    = 2 (decimal)   ⇒   C.


(2)  If one takes into account that

  • for  $i = 1$  two bits are used,
  • for  $2 ≤ i ≤ 5$  three bits are used,
  • for  $6 ≤ i ≤ 19$  four bits are used,
  • for  $14 ≤ i ≤ 29$  five bits are used,


we get from the  "continuous decoder input string"

$$\boldsymbol{\rm 00 000 001 010 100 0001 1000 0111 0001 0001 0011 1011 1011 01101 00011 01001}$$

to the  "divided decoder input string"  $($designated  $I_1$, ... ,  $I_{16})$:

$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001 |1000 |0111 | 0001 |0001 |0011 |1011 |1011 |01101 |00011 |01001} \hspace{0.1cm}.$$

Thus the results sought for the decoding steps  $i = 5$, ... ,  $i = 8$:

  • $I_5 =$ 100 (binary)      = 4 (decimal)   ⇒   AA,
  • $I_6 =$ 0001 (binary)    = 1 (decimal)   ⇒   B,
  • $I_7 =$ 1000 (binary)    = 8 (decimal)   ⇒   AAB,
  • $I_8 =$ 0111 (binary)    = 7 (decimal)   ⇒   CA.


(3) Statement 2 is correct.

  • The following decoding results are obtained (in abbreviated form):
  $I_{9} = 1$   ⇒   B, $\hspace{0.5cm}I_{10} = 1$   ⇒  B,$\hspace{0.5cm}I_{11} = 3$   ⇒   D,$\hspace{0.5cm}I_{12} = 11$   ⇒   CAB,
  $I_{13} = 11$   ⇒   CAB,$\hspace{0.5cm}I_{14} = 13$   ⇒   BD,$\hspace{0.5cm}I_{15} = 3$   ⇒   D,$\hspace{0.5cm}I_{16} = 9$   ⇒   BA.


(4)  Correct are thestatements 1 and 4.  Reason:

  • The new incoming index is  $18$  (decimal) and in the dictionary the entry  "DB"  is found under the index  $I = 18$.
  • For the decoding step  $i = 17$ , the line  $I = 19$  is entered into the dictionary.
  • The entry  "BAD"  is composed of the decoding result from step  $i = 16$  $($BA$)$  and the first character  $($D$)$  of the new result  "DB".


(5)  Only statement 1 is correct here:

  • Two bits are sufficient for the first phrase.
  • For the phrases  $I_2$, ... , $I_5$  three bits are needed.
  • The phrases  $I_6$, ... , $I_{13}$  require four bits.
  • The phrases  $I_{14}$, ... , $I_{29}$  require five bits.
  • The phrases  $I_{30}$, ... , $I_{58}$  require six bits.
  • This gives the total number of bits:
$$N_{\rm bits} = 1 \cdot 2 + 4 \cdot 3 + 8 \cdot 4 + 16 \cdot 5 + 29 \cdot 6 = 300 \hspace{0.05cm}.$$
  • With a uniform bit length (six bits for each index),  $58 · 6 = 348$  bits (i.e. more) would be required   ⇒   statement 2 is wrong in principle.


Now to the third statement, which is also not true:

  • With the uncoded transmission of  $N = 150$  characters from the symbol set  $\{$ ABCD $\}$  one would need exactly  $300$  bits.  With LZW one certainly needs more bits if the source is redundancy-free   ⇒   characters equally likely and statistically independent.
  • With a redundant source with (e.g.)  $H = 1.6$  bit/source symbol, one can limit the number of bits to  $N_{\rm bits} = N \cdot H$,  provided it is a very large file  $(N \to \infty)$.
  • With a rather small file – as here with only  $N = 150$  bits – it is not possible to say whether   $N_{\rm bits}$  will be less than, equal to or greater than  $150 · 1.6 = 240$.
  • Also quite possible is  $N_{\rm bits} > 300$.  In that case, it is better to do without the  "Lempel–Ziv compression".