Difference between revisions of "Aufgaben:Exercise 2.4: About the LZW Algorithm"

From LNTwww
Line 60: Line 60:
  
  
{Wie lautet der Ausgabe–String nach  $i = 16$  Decodierschritten?
+
{What is the output string after  $i = 16$  decoding steps?
 
|type="()"}
 
|type="()"}
 
- <b>AABCAABAABCAABCABBDBBDCDBA</b>,
 
- <b>AABCAABAABCAABCABBDBBDCDBA</b>,
Line 67: Line 67:
  
  
{Der nächste Index lautet: &nbsp; $I_{17} = 10010$ (binär) $= 18$ (dezimal).&nbsp; Welche der folgenden Aussagen treffen zu?
+
{The next index is: &nbsp; $I_{17} = 10010$ (binary) $= 18$ (decimal).&nbsp; Which of the following statements are true?
 
|type="[]"}
 
|type="[]"}
+ Die Decoderausgabe zum Schritt&nbsp; $i = 17$&nbsp; lautet&nbsp; <b>DB</b>.
+
+ The decoder output to step&nbsp; $i = 17$&nbsp; lautet&nbsp; <b>DB</b>.
- Die Decoderausgabe zum Schritt&nbsp; $i = 17$&nbsp; lautet&nbsp; <b>BAD</b>.
+
- The decoder output to step&nbsp; $i = 17$&nbsp; lautet&nbsp; <b>BAD</b>.
- Neuer Wörterbucheintrag&nbsp; <b>DB</b>&nbsp; in die Zeile&nbsp; $I = 19$.
+
- New dictionary entry&nbsp; <b>DB</b>&nbsp; in line&nbsp; $I = 19$.
+ Neuer Wörterbucheintrag&nbsp; <b>BAD</b>&nbsp; in die Zeile&nbsp; $I = 19$.
+
+ New dictionary entry&nbsp; <b>BAD</b>&nbsp; in line&nbsp; $I = 19$.
  
  
{Der Decoder&ndash;Eingabestring bestehe aus&nbsp; $N_{\rm Bit} = 300$&nbsp; Binärzeichen (Bit).&nbsp; Welche der folgenden Aussagen sind dann sicher zutreffend?
+
{Let the decoder input string consist of&nbsp; $N_{\rm Bit} = 300$&nbsp; binary characters (bits).&nbsp; Which of the following statements is then certainly true?
 
|type="[]"}
 
|type="[]"}
+ Übertragen wurden&nbsp; $58$&nbsp; LZW&ndash;Phrasen mit variabler Bitlänge.
+
+ &nbsp; $58$&nbsp; LZW phrases with variable bit length were transmitted.
- Mit einheitlicher Indexbitlänge wären weniger Bit erforderlich.
+
- With a uniform index bit length, fewer bits would be required.
- Übertragen wurde eine Datei mit&nbsp; $N =150$&nbsp; Quaternärzeichen.
+
- A file with&nbsp; $N =150$&nbsp; quaternary characters was transmitted.
  
  
Line 85: Line 85:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
'''(1)'''&nbsp; Bei der LZW&ndash;Codierung des ersten Zeichens&nbsp; $(i = 1)$&nbsp; sind nur die vier Indizes&nbsp; $I = 0$, ... , $I = 3$&nbsp; möglich, für die bereits zum Schritt&nbsp; $i = 0$&nbsp; (Vorbelegung) ein Wörterbucheintrag vorgenommen wurde. Deshalb genügt es hier, den Index mit zwei Bit zu übertragen.
+
'''(1)'''&nbsp; For the LZW coding of the first character&nbsp; $(i = 1)$&nbsp;, only the four indices&nbsp; $I = 0$, ... , $I = 3$&nbsp; are possible for which a dictionary entry has already been made at step&nbsp; $i = 0$&nbsp; (preassignment). Therefore, it is sufficient here to transmit the index with two bits.
  
*Bereits zum Schritt&nbsp; $i = 2$&nbsp; werden dagegen drei Bit benötigt. Hätte die Eingangsfolge mit&nbsp; <b>AAA</b>&nbsp; begonnen, so hätte die LZW&ndash;Codierung&nbsp; $I_2 = I(i = 2)$&nbsp; den Dezimalwert&nbsp; $4$&nbsp; ergeben. Dieser ist nicht mehr mit zwei Bit darstellbar und muss mit drei Bit codiert werden, ebenso wie&nbsp; $I_3$,&nbsp; $I_4$&nbsp; und&nbsp; $I_5$.
+
*In contrast, three bits are already required for step&nbsp; $i = 2$&nbsp;. If the input sequence had started with&nbsp; <b>AAA</b>&nbsp;, the LZW coding&nbsp; $I_2 = I(i = 2)$&nbsp; would have resulted in the decimal value&nbsp; $4$&nbsp;. This can no longer be represented with two bits and must be coded with three bits, just like&nbsp; $I_3$,&nbsp; $I_4$&nbsp; and&nbsp; $I_5$.
  
*Der vorgegebene Eingabestring
+
*The given input string:$$\boldsymbol{\rm 00 000 001 010 100 0001}\hspace{0.05cm}...$$
:$$\boldsymbol{\rm 00 000 001 010 100 0001}\hspace{0.05cm}...$$
+
:must therefore be divided by the decoder as follows:
:ist deshalb vom Decoder wie folgt aufzuteilen:
 
 
:$$\boldsymbol{\rm 00 | 000 |001 |010 |100 |0001|}\hspace{0.05cm}...$$
 
:$$\boldsymbol{\rm 00 | 000 |001 |010 |100 |0001|}\hspace{0.05cm}...$$
  
*Die Decoderergebnisse der ersten vier Schritte lauten somit:
+
*The decoder results of the first four steps are therefore:
:* $I_1 =$ <b>00</b> (binär) &nbsp;&nbsp; &nbsp;<u> = 0</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>A</b>,
+
:* $I_1 =$ <b>00</b> (binary) &nbsp;&nbsp; &nbsp;<u> = 0</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>A</b>,
:* $I_2 =$ <b>000</b> (binär) &nbsp;&nbsp;<u> = 0</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>A</b>,
+
:* $I_2 =$ <b>000</b> (binary) &nbsp;&nbsp;<u> = 0</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>A</b>,
:* $I_3 =$ <b>001</b> (binär) &nbsp;&nbsp;<u> = 1</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>,
+
:* $I_3 =$ <b>001</b> (binary) &nbsp;&nbsp;<u> = 1</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>,
:* $I_4 =$ <b>010</b> (binär) &nbsp;&nbsp;<u> = 2</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>C</b>.
+
:* $I_4 =$ <b>010</b> (binary) &nbsp;&nbsp;<u> = 2</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>C</b>.
  
  
'''(2)'''&nbsp; Berücksichtigt man, dass
+
'''(2)'''&nbsp; If one takes into account that
* für&nbsp; $i = 1$&nbsp; zwei Bit verwendet werden,
+
* for&nbsp; $i = 1$&nbsp; two bits are used,
* für&nbsp; $2 &#8804; i &#8804; 5$&nbsp; drei Bit,
+
* for&nbsp; $2 &#8804; i &#8804; 5$&nbsp; three bits,
* für&nbsp; $6 &#8804; i &#8804; 19$&nbsp; vier Bit,
+
* for&nbsp; $6 &#8804; i &#8804; 19$&nbsp; four bits,
* für&nbsp; $14 &#8804; i &#8804; 29$&nbsp; fünf Bit,
+
* for&nbsp; $14 &#8804; i &#8804; 29$&nbsp; five bits,
  
  
so kommt man vom "kontinuierlichen Decoder&ndash;Eingangsstring"
+
we get from the "continuous decoder input string"
 
:$$\boldsymbol{\rm 00 000 001 010 100 0001 1000 0111 0001 0001 0011 1011 1011 01101 00011 01001}$$
 
:$$\boldsymbol{\rm 00 000 001 010 100 0001 1000 0111 0001 0001 0011 1011 1011 01101 00011 01001}$$
  
zum "eingeteilten Decoder&ndash;Eingabestring" $($bezeichnet mit&nbsp;  $I_1$, ... ,&nbsp; $I_{16})$:
+
to the "divided decoder input string" $($designated&nbsp;  $I_1$, ... ,&nbsp; $I_{16})$:
 
:$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001 |1000 |0111 | 0001 |0001 |0011  |1011 |1011 |01101 |00011 |01001}
 
:$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001 |1000 |0111 | 0001 |0001 |0011  |1011 |1011 |01101 |00011 |01001}
 
\hspace{0.1cm}.$$
 
\hspace{0.1cm}.$$
  
 
Damit lauten die gesuchten Ergebnisse für die Decodierschritte&nbsp; $i = 5$, ... ,&nbsp; $i = 8$:
 
Damit lauten die gesuchten Ergebnisse für die Decodierschritte&nbsp; $i = 5$, ... ,&nbsp; $i = 8$:
* $I_5 =$ <b>100</b> (binär) &nbsp; &nbsp;&nbsp;<u> = 4</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>AA</b>,
+
* $I_5 =$ <b>100</b> (binary) &nbsp; &nbsp;&nbsp;<u> = 4</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>AA</b>,
* $I_6 =$ <b>0001</b> (binär) &nbsp;&nbsp;<u> = 1</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>,
+
* $I_6 =$ <b>0001</b> (binary) &nbsp;&nbsp;<u> = 1</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>,
* $I_7 =$ <b>1000</b> (binär) &nbsp;&nbsp;<u> = 8</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>AAB</b>,
+
* $I_7 =$ <b>1000</b> (binary) &nbsp;&nbsp;<u> = 8</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>AAB</b>,
* $I_8 =$ <b>0111</b> (binär) &nbsp;&nbsp;<u> = 7</u> (dezimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>CA</b>.
+
* $I_8 =$ <b>0111</b> (binary) &nbsp;&nbsp;<u> = 7</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>CA</b>.
  
  
'''(3)'''&nbsp; Richtig is tdie  <u>Aussage 2</u>.  
+
'''(3)'''&nbsp;<u>Statement 2</u> is correct.  
*Man erhält folgende Decodierergebnisse (in verkürzter Form):
+
*The following decoding results are obtained (in abbreviated form):
  
 
::&nbsp;&nbsp;$I_{9} = 1$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>, $\hspace{0.5cm}I_{10} = 1$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp;<b>B</b>,$\hspace{0.5cm}I_{11} = 3$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>D</b>,$\hspace{0.5cm}I_{12} = 11$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>CAB</b>,
 
::&nbsp;&nbsp;$I_{9} = 1$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>, $\hspace{0.5cm}I_{10} = 1$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp;<b>B</b>,$\hspace{0.5cm}I_{11} = 3$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>D</b>,$\hspace{0.5cm}I_{12} = 11$ &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>CAB</b>,
Line 133: Line 132:
  
  
'''(4)'''&nbsp; Richtig sind die <u>Aussagen 1 und 4</u>. Begründung:
+
'''(4)'''&nbsp; Correct are <u>statements 1 and 4</u>. Reason:
  
* Der neu ankommende Index ist&nbsp; $18$&nbsp; (dezimal) und im Wörterbuch wird unter dem Index&nbsp; $I = 18$&nbsp; der Eintrag&nbsp; <b>DB</b>&nbsp; gefunden.
+
* The new incoming index is&nbsp; $18$&nbsp; (decimal) and in the dictionary the entry&nbsp; <b>DB</b>&nbsp; is found under the index &nbsp; $I = 18$&nbsp;.
* Zum Decodierschritt&nbsp; $i = 17$&nbsp; wird in das Wörterbuch die Zeile&nbsp; $I = 19$&nbsp; eingetragen.  
+
* For the decoding step&nbsp; $i = 17$&nbsp;, the line&nbsp; $I = 19$&nbsp; is entered into the dictionary.
*Der Eintrag&nbsp; <b>BAD</b>&nbsp; setzt sich zusammen aus dem Decodierergebnis aus Schritt&nbsp; $i = 16$&nbsp; $($<b>BA</b>$)$&nbsp; und dem ersten Zeichen&nbsp; $($<b>D</b>$)$&nbsp; des neuen Ergebnisses&nbsp; <b>DB</b>.
+
* The entry&nbsp; <b>BAD</b>&nbsp; is composed of the decoding result from step&nbsp; $i = 16$&nbsp; $($<b>BA</b>$)$&nbsp; and the first character&nbsp; $($<b>D</b>$)$&nbsp; of the new result&nbsp; <b>DB</b>.
  
  
  
'''(5)'''&nbsp; Richtig ist hier nur die <u>Aussage 1</u>:
+
'''(5)'''&nbsp; Only <u>statement 1</u> is correct here:
* Für die erste Phrase genügen zwei Bit.
+
* Two bits are sufficient for the first phrase.
* Für die Phrasen&nbsp; $I_2$, ... , $I_5$&nbsp; benötigt man drei Bit.
+
* For the phrases&nbsp; $I_2$, ... , $I_5$&nbsp; three bits are needed.
* Für die Phrasen&nbsp; $I_6$, ... , $I_{13}$&nbsp; benötigt man vier Bit.
+
* For the phrasesn&nbsp; $I_6$, ... , $I_{13}$&nbsp; require four bits.
* Für die Phrasen&nbsp; $I_{14}$, ... , $I_{29}$&nbsp; benötigt man fünf Bit.
+
* For the phrases&nbsp; $I_{14}$, ... , $I_{29}$&nbsp; require five bits.
* Für die Phrasen&nbsp; $I_{30}$, ... , $I_{58}$&nbsp; benötigt man sechs Bit.
+
* For the phrases&nbsp; $I_{30}$, ... , $I_{58}$&nbsp; require six bits.
  
*Damit erhält man für die gesamte Bitanzahl:
+
*This gives the total number of bits:
 
:$$N_{\rm Bit} = 1 \cdot 2 + 4 \cdot 3 + 8 \cdot 4 + 16 \cdot 5 + 29 \cdot 6 = 300 \hspace{0.05cm}.$$
 
:$$N_{\rm Bit} = 1 \cdot 2 + 4 \cdot 3 + 8 \cdot 4 + 16 \cdot 5 + 29 \cdot 6 = 300 \hspace{0.05cm}.$$
  
*Mit einheitlicher Bitlänge (sechs Bit für jeden Index) wären&nbsp; $58 &middot; 6 = 348$&nbsp; Bit (also mehr) erforderlich &nbsp; &#8658; &nbsp; die Aussage 2 ist prinzipiell falsch.  
+
*With a uniform bit length (six bits for each index),&nbsp; $58 &middot; 6 = 348$&nbsp; bits (i.e. more) would be required &nbsp; &#8658; &nbsp; statement 2 is wrong in principle.
  
  
Nun zur dritten Aussage, die ebenfalls nicht zutrifft:
+
Now to the third statement, which is also not true:
* Bei der uncodierten Übertragung von&nbsp; $N = 150$&nbsp; Zeichen aus der Symbolmenge&nbsp; $\{$ <b>A</b>, \ <b>B</b>, \ <b>C</b>, \  <b>D</b> $\}$&nbsp; würde man genau&nbsp; $300$&nbsp; Bit benötigen. Mit LZW benötigt man sicher mehr Bit, wenn die Quelle redundanzfrei ist&nbsp; (Zeichen gleichwahrscheinlich und statistisch unabhängig).
+
* With the uncoded transmission of&nbsp; $N = 150$&nbsp; characters from the symbol set&nbsp; $\{$ <b>A</b>, \ <b>B</b>, \ <b>C</b>, \  <b>D</b> $\}$&nbsp; one would need exactly&nbsp; $300$&nbsp; bits. With LZW one certainly needs more bits if the source is redundancy-free&nbsp; characters equally likely and statistically independent).
* Bei redundanter Quelle mit (beispielsweise)&nbsp; $H = 1.6$ bit/Quellensymbol kann man die Bitanzahl auf&nbsp; $N_{\rm Bit} = N \cdot H$&nbsp;  begrenzen, vorausgesetzt, es handelt sich um eine sehr große Datei&nbsp; $(N \to \infty)$.
+
* With redundant source with (for example)&nbsp; $H = 1.6$ bit/source symbol, one can limit the number of bits to&nbsp; $N_{\rm Bit} = N \cdot H$&nbsp;  , provided it is a very large file&nbsp; $(N \to \infty)$.
* Bei einer eher kleinen Datei &ndash; wie hier lediglich mit&nbsp; $N = 150$&nbsp; Bit &ndash; ist keine Aussage möglich, ob die Bitanzahl&nbsp; $N_{\rm Bit}$&nbsp; kleiner, gleich oder größer als&nbsp; $150 &middot; 1.6 = 240$&nbsp; sein wird.  
+
* With a rather small file &ndash; as here only with&nbsp; $N = 150$&nbsp; bit &ndash; it is not possible to say whether the number of bits&nbsp; $N_{\rm Bit}$&nbsp; will be less than, equal to or greater than&nbsp; $150 &middot; 1.6 = 240$&nbsp;.  
*Auch&nbsp; $N_{\rm Bit} > 300$&nbsp; ist durchaus möglich.&nbsp; Dann sollte man auf die "Lempel&ndash;Ziv&ndash;Komprimierung" besser verzichten.
+
*&nbsp; $N_{\rm Bit} > 300$&nbsp; is also quite possible.&nbsp; In that case, it is better to do without the "Lempel&ndash;Ziv&ndash;compression".
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  

Revision as of 19:52, 12 July 2021

LZW Dictionary for Encoding/Decoding

The compression algorithm "LZW " – named after the inventors  Abraham LempelJacob Ziv  and  Terry Welch  – works like "LZ78 " with a global dictionary.   In this case, all possible characters – in the example  ABCD  – are preassigned at the beginning and are successively expanded during encoding.

During decoding, exactly the same dictionary is created, only the same entry with index  $I$  is made one step later than during coding.  To designate the decoding steps, we use the run variable  $i$ here.

Here are some more notes on LZW encoding and decoding:

  • During encoding, at each time  $i$ , a search is made in the dictionary for the longest possible character string that matches the input string currently present.
  • The dictionary index  $I_i$  found is always transmitted in binary form.  At the same time,  $W(I_i) + Z$  is entered into the dictionary under the next free index.
  • Here,  $W(I_i)$  denotes a character or a character string, and  $Z$  is the first character of the upcoming input string (i.e. also a character), which is no longer considered in  $W(I_i)$ .
  • With  $M = 4$  possible characters, the first index  $I_1$  is transmitted with two bits, the indices  $I_2$, ... , $I_5$  with three bits, the next eight indices with four bits, then 16 indices with five bits, etc. The reason for this bit-saving measure can be found in the sample solution to this task.


After coding in the manner described here over  $16$  coding steps, the following binary sequence of length  $N_{\rm Bit} = 61$:

$$\boldsymbol{\rm 00 \ 000 \ 001 \ 010 \ 100 \ 0001 \ 1000 \ 0111 \ 0001 \ 0001 \ 0011 \ 1011 \ 1011 \ 01101 \ 00011 \ 01001}\hspace{0.05cm}.$$

The task of the decoder  (more precisely:  your task)  is now,

  • to reconstruct from this binary sequence the indices  $I_1$, ... , $I_{16}$  taking into account the different number of bits  (description of the 16 decoding results by decimal numbers),
  • then read out the corresponding characters or character sequences from the dictionary according to the indices and finally – with one step delay – generate the new dictionary entry.





Hints:

LZ77 - the basic form of Lempel-Ziv algorithms,
Lempel-Ziv coding with variable index bit length,
decoding of the LZW algorithm.


Questions

1

Which indices are available for the first four steps  $i=1$, ... , $i=4$  for decoding  Enter all  $I_i = I(i)$  as decimal numbers.

$I_1 \ = \ $

$I_2 \ = \ $

$I_3 \ = \ $

$I_4 \ = \ $

2

Continue the subdivision of the decoder input string to the end.  Which indices result from the decoding steps  $i=5$, ... , $i=8$?

$I_5 \ = \ $

$I_6 \ = \ $

$I_7 \ = \ $

$I_8 \ = \ $

3

What is the output string after  $i = 16$  decoding steps?

AABCAABAABCAABCABBDBBDCDBA,
AABCAABAABCABBDCABCABBDDBA,
AABCAABAABCABDBBABCCDBAABDCDBA.

4

The next index is:   $I_{17} = 10010$ (binary) $= 18$ (decimal).  Which of the following statements are true?

The decoder output to step  $i = 17$  lautet  DB.
The decoder output to step  $i = 17$  lautet  BAD.
New dictionary entry  DB  in line  $I = 19$.
New dictionary entry  BAD  in line  $I = 19$.

5

Let the decoder input string consist of  $N_{\rm Bit} = 300$  binary characters (bits).  Which of the following statements is then certainly true?

  $58$  LZW phrases with variable bit length were transmitted.
With a uniform index bit length, fewer bits would be required.
A file with  $N =150$  quaternary characters was transmitted.


Solution

(1)  For the LZW coding of the first character  $(i = 1)$ , only the four indices  $I = 0$, ... , $I = 3$  are possible for which a dictionary entry has already been made at step  $i = 0$  (preassignment). Therefore, it is sufficient here to transmit the index with two bits.

  • In contrast, three bits are already required for step  $i = 2$ . If the input sequence had started with  AAA , the LZW coding  $I_2 = I(i = 2)$  would have resulted in the decimal value  $4$ . This can no longer be represented with two bits and must be coded with three bits, just like  $I_3$,  $I_4$  and  $I_5$.
  • The given input string:$$\boldsymbol{\rm 00 000 001 010 100 0001}\hspace{0.05cm}...$$
must therefore be divided by the decoder as follows:
$$\boldsymbol{\rm 00 | 000 |001 |010 |100 |0001|}\hspace{0.05cm}...$$
  • The decoder results of the first four steps are therefore:
  • $I_1 =$ 00 (binary)      = 0 (decimal)   ⇒   A,
  • $I_2 =$ 000 (binary)    = 0 (decimal)   ⇒   A,
  • $I_3 =$ 001 (binary)    = 1 (decimal)   ⇒   B,
  • $I_4 =$ 010 (binary)    = 2 (decimal)   ⇒   C.


(2)  If one takes into account that

  • for  $i = 1$  two bits are used,
  • for  $2 ≤ i ≤ 5$  three bits,
  • for  $6 ≤ i ≤ 19$  four bits,
  • for  $14 ≤ i ≤ 29$  five bits,


we get from the "continuous decoder input string"

$$\boldsymbol{\rm 00 000 001 010 100 0001 1000 0111 0001 0001 0011 1011 1011 01101 00011 01001}$$

to the "divided decoder input string" $($designated  $I_1$, ... ,  $I_{16})$:

$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001 |1000 |0111 | 0001 |0001 |0011 |1011 |1011 |01101 |00011 |01001} \hspace{0.1cm}.$$

Damit lauten die gesuchten Ergebnisse für die Decodierschritte  $i = 5$, ... ,  $i = 8$:

  • $I_5 =$ 100 (binary)      = 4 (decimal)   ⇒   AA,
  • $I_6 =$ 0001 (binary)    = 1 (decimal)   ⇒   B,
  • $I_7 =$ 1000 (binary)    = 8 (decimal)   ⇒   AAB,
  • $I_8 =$ 0111 (binary)    = 7 (decimal)   ⇒   CA.


(3) Statement 2 is correct.

  • The following decoding results are obtained (in abbreviated form):
  $I_{9} = 1$   ⇒   B, $\hspace{0.5cm}I_{10} = 1$   ⇒  B,$\hspace{0.5cm}I_{11} = 3$   ⇒   D,$\hspace{0.5cm}I_{12} = 11$   ⇒   CAB,
  $I_{13} = 11$   ⇒   CAB,$\hspace{0.5cm}I_{14} = 13$   ⇒   BD,$\hspace{0.5cm}I_{15} = 3$   ⇒   D,$\hspace{0.5cm}I_{16} = 9$   ⇒   BA.


(4)  Correct are statements 1 and 4. Reason:

  • The new incoming index is  $18$  (decimal) and in the dictionary the entry  DB  is found under the index   $I = 18$ .
  • For the decoding step  $i = 17$ , the line  $I = 19$  is entered into the dictionary.
  • The entry  BAD  is composed of the decoding result from step  $i = 16$  $($BA$)$  and the first character  $($D$)$  of the new result  DB.


(5)  Only statement 1 is correct here:

  • Two bits are sufficient for the first phrase.
  • For the phrases  $I_2$, ... , $I_5$  three bits are needed.
  • For the phrasesn  $I_6$, ... , $I_{13}$  require four bits.
  • For the phrases  $I_{14}$, ... , $I_{29}$  require five bits.
  • For the phrases  $I_{30}$, ... , $I_{58}$  require six bits.
  • This gives the total number of bits:
$$N_{\rm Bit} = 1 \cdot 2 + 4 \cdot 3 + 8 \cdot 4 + 16 \cdot 5 + 29 \cdot 6 = 300 \hspace{0.05cm}.$$
  • With a uniform bit length (six bits for each index),  $58 · 6 = 348$  bits (i.e. more) would be required   ⇒   statement 2 is wrong in principle.


Now to the third statement, which is also not true:

  • With the uncoded transmission of  $N = 150$  characters from the symbol set  $\{$ A, \ B, \ C, \ D $\}$  one would need exactly  $300$  bits. With LZW one certainly needs more bits if the source is redundancy-free  characters equally likely and statistically independent).
  • With redundant source with (for example)  $H = 1.6$ bit/source symbol, one can limit the number of bits to  $N_{\rm Bit} = N \cdot H$  , provided it is a very large file  $(N \to \infty)$.
  • With a rather small file – as here only with  $N = 150$  bit – it is not possible to say whether the number of bits  $N_{\rm Bit}$  will be less than, equal to or greater than  $150 · 1.6 = 240$ .
  •   $N_{\rm Bit} > 300$  is also quite possible.  In that case, it is better to do without the "Lempel–Ziv–compression".