Difference between revisions of "Aufgaben:Exercise 2.4: About the LZW Algorithm"

From LNTwww
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Komprimierung nach Lempel, Ziv und Welch
+
{{quiz-Header|Buchseite=Information_Theory/Compression_According_to_Lempel,_Ziv_and_Welch
 
}}
 
}}
  
[[File:EN_Inf_A_2_4.png|right|frame|LZW Dictionary for Encoding/Decoding]]
+
[[File:EN_Inf_A_2_4.png|right|frame|LZW dictionary for Encoding/Decoding]]
The compression algorithm "LZW " &ndash; named after the inventors&nbsp;  [https://en.wikipedia.org/wiki/Abraham_Lempel Abraham Lempel],&nbsp; [https://en.wikipedia.org/wiki/Jacob_Ziv Jacob Ziv]&nbsp;  and&nbsp;  [https://en.wikipedia.org/wiki/Terry_Welch Terry Welch]&nbsp; &ndash; works like "LZ78 " with a global dictionary. &nbsp; In this case, all possible characters &ndash; in the example&nbsp; <b>A</b>,&nbsp; <b>B</b>,&nbsp; <b>C</b>,&nbsp; <b>D</b>&nbsp; &ndash; are preassigned at the beginning and are successively expanded during encoding.
+
The compression algorithm&nbsp; "LZW"&nbsp; &ndash; named after the inventors&nbsp;  [https://en.wikipedia.org/wiki/Abraham_Lempel Abraham Lempel],&nbsp; [https://en.wikipedia.org/wiki/Jacob_Ziv Jacob Ziv]&nbsp;  and&nbsp;  [https://en.wikipedia.org/wiki/Terry_Welch Terry Welch]&nbsp; &ndash; works like&nbsp; "LZ78"&nbsp; with a global dictionary. &nbsp; In this case, all possible characters &ndash; in the example&nbsp; <b>A</b>,&nbsp; <b>B</b>,&nbsp; <b>C</b>,&nbsp; <b>D</b>&nbsp; &ndash; are preassigned at the beginning and are successively expanded during encoding.
  
During decoding, exactly the same dictionary is created, only the same entry with index&nbsp; $I$&nbsp; is made one step later than during coding.&nbsp; To designate the decoding steps, we use the run variable&nbsp; $i$ here.
+
During decoding, exactly the same dictionary is created, only the same entry with index&nbsp; $I$&nbsp; is made one step later than during encoding.&nbsp; To designate the decoding steps, we use the variable&nbsp; $i$.
  
 
Here are some more notes on LZW encoding and decoding:
 
Here are some more notes on LZW encoding and decoding:
*During encoding, at each time&nbsp; $i$&nbsp;, a search is made in the dictionary for the longest possible character string that matches the input string currently present.
+
*During encoding, at each time&nbsp; $i$,&nbsp; a search is made in the dictionary for the longest possible character string that matches the input string currently present.
 
*The dictionary index&nbsp; $I_i$&nbsp; found is always transmitted in binary form.&nbsp; At the same time,&nbsp; $W(I_i) + Z$&nbsp; is entered into the dictionary under the next free index.
 
*The dictionary index&nbsp; $I_i$&nbsp; found is always transmitted in binary form.&nbsp; At the same time,&nbsp; $W(I_i) + Z$&nbsp; is entered into the dictionary under the next free index.
 
*Here,&nbsp; $W(I_i)$&nbsp; denotes a character or a character string, and&nbsp; $Z$&nbsp; is the first character of the upcoming input string (i.e. also a character), which is no longer considered in&nbsp; $W(I_i)$&nbsp;.
 
*Here,&nbsp; $W(I_i)$&nbsp; denotes a character or a character string, and&nbsp; $Z$&nbsp; is the first character of the upcoming input string (i.e. also a character), which is no longer considered in&nbsp; $W(I_i)$&nbsp;.
*With&nbsp; $M = 4$&nbsp; possible characters, the first index&nbsp; $I_1$&nbsp; is transmitted with two bits, the indices&nbsp; $I_2$,&nbsp;... ,&nbsp;$I_5$&nbsp; with three bits, the next eight indices with four bits, then 16 indices with five bits, etc. The reason for this bit-saving measure can be found in the sample solution to this task.
+
*With&nbsp; $M = 4$&nbsp; possible characters, the first index&nbsp; $I_1$&nbsp; is transmitted with two bits, the indices&nbsp; $I_2$,&nbsp;... ,&nbsp;$I_5$&nbsp; with three bits, the next eight indices with four bits, then 16 indices with five bits, etc. The reason for this bit-saving measure can be found in the solution to this task.
  
  
After coding in the manner described here over&nbsp; $16$&nbsp; coding steps, the following binary sequence of length&nbsp; $N_{\rm Bit} = 61$:
+
After coding in the manner described here over&nbsp; $16$&nbsp; coding steps, the following binary sequence of length&nbsp; $N_{\rm bit} = 61$:
 
:$$\boldsymbol{\rm 00 \ 000 \ 001 \ 010 \ 100 \ 0001 \ 1000 \ 0111 \ 0001 \ 0001 \ 0011 \ 1011 \ 1011 \ 01101 \ 00011 \  01001}\hspace{0.05cm}.$$
 
:$$\boldsymbol{\rm 00 \ 000 \ 001 \ 010 \ 100 \ 0001 \ 1000 \ 0111 \ 0001 \ 0001 \ 0011 \ 1011 \ 1011 \ 01101 \ 00011 \  01001}\hspace{0.05cm}.$$
  
Line 29: Line 29:
  
  
''Hints:''
+
Hints:
*The task belongs to the chapter&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch|Compression according to Lempel, Ziv and Welch]].
+
*The exercise belongs to the chapter&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch|Compression according to Lempel, Ziv and Welch]].
*Insbesondere wird  Bezug genommen auf die Seiten&nbsp;
+
*In particular, reference is made to the pages&nbsp;
 
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#LZ77_-_the_basic_form_of_the_Lempel-Ziv_algorithms|LZ77 - the basic form of Lempel-Ziv algorithms]],
 
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#LZ77_-_the_basic_form_of_the_Lempel-Ziv_algorithms|LZ77 - the basic form of Lempel-Ziv algorithms]],
 
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Lempel-Ziv_coding_with_variable_index_bit_length|Lempel-Ziv coding with variable index bit length]],
 
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Lempel-Ziv_coding_with_variable_index_bit_length|Lempel-Ziv coding with variable index bit length]],
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Decoding_of_the_LZW_algorithm|decoding of the LZW algorithm]].
+
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Decoding_of_the_LZW_algorithm|Decoding of the LZW algorithm]].
 
   
 
   
*&nbsp; [[Aufgaben:Aufgabe_2.3:_Zur_LZ78-Komprimierung|Task 2.3]]&nbsp; and&nbsp; [[Aufgaben:Aufgabe_2.3Z:_Zur_LZ77-Codierung|Task 2.3Z]]&nbsp; deal with other LZ methods in a similar way.
+
*&nbsp; [[Aufgaben:Exercise_2.3:_About_the_LZ78_Compression|Exercise 2.3]]&nbsp; and&nbsp; [[Aufgaben:Exercise_2.3Z:_About_the_LZ77_Coding|Exercise 2.3Z]]&nbsp; deal with other Lempel-Ziv methods in a similar way.
  
  
Line 43: Line 43:
  
 
<quiz display=simple>
 
<quiz display=simple>
{Which indices are available for the first four steps&nbsp; $i=1$, ... , $i=4$&nbsp; for decoding&nbsp; Enter all&nbsp; $I_i = I(i)$&nbsp; as decimal numbers.
+
{Which indices are available for the first four steps&nbsp; $(i=1$, ... , $i=4)$&nbsp; for decoding.&nbsp; Enter all&nbsp; $I_i = I(i)$&nbsp; as decimal numbers.
 
|type="{}"}
 
|type="{}"}
 
$I_1 \ = \ $ { 0. }
 
$I_1 \ = \ $ { 0. }
Line 69: Line 69:
 
{The next index is: &nbsp; $I_{17} = 10010$ (binary) $= 18$ (decimal).&nbsp; Which of the following statements are true?
 
{The next index is: &nbsp; $I_{17} = 10010$ (binary) $= 18$ (decimal).&nbsp; Which of the following statements are true?
 
|type="[]"}
 
|type="[]"}
+ The decoder output to step&nbsp; $i = 17$&nbsp; lautet&nbsp; <b>DB</b>.
+
+ The decoder output to step&nbsp; $i = 17$&nbsp; is&nbsp; "<b>DB</b>".
- The decoder output to step&nbsp; $i = 17$&nbsp; lautet&nbsp; <b>BAD</b>.
+
- The decoder output to step&nbsp; $i = 17$&nbsp; is&nbsp; "<b>BAD</b>".
- New dictionary entry&nbsp; <b>DB</b>&nbsp; in line&nbsp; $I = 19$.
+
- New dictionary entry&nbsp; "<b>DB</b>"&nbsp; in line&nbsp; $I = 19$.
+ New dictionary entry&nbsp; <b>BAD</b>&nbsp; in line&nbsp; $I = 19$.
+
+ New dictionary entry&nbsp; "<b>BAD</b>"&nbsp; in line&nbsp; $I = 19$.
  
  
{Let the decoder input string consist of&nbsp; $N_{\rm Bit} = 300$&nbsp; binary characters (bits).&nbsp; Which of the following statements is then certainly true?
+
{Let the decoder input string consist of&nbsp; $N_{\rm bit} = 300$&nbsp; binary characters (bits).&nbsp; Which of the following statements is then certainly true?
 
|type="[]"}
 
|type="[]"}
+ &nbsp; $58$&nbsp; LZW phrases with variable bit length were transmitted.
+
+ $58$&nbsp; LZW phrases with variable bit length were transmitted.
 
- With a uniform index bit length, fewer bits would be required.
 
- With a uniform index bit length, fewer bits would be required.
 
- A file with&nbsp; $N =150$&nbsp; quaternary characters was transmitted.
 
- A file with&nbsp; $N =150$&nbsp; quaternary characters was transmitted.
Line 87: Line 87:
 
===Solution===
 
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
'''(1)'''&nbsp;  For the LZW coding of the first character&nbsp; $(i = 1)$&nbsp;, only the four indices&nbsp; $I = 0$, ... , $I = 3$&nbsp; are possible for which a dictionary entry has already been made at step&nbsp; $i = 0$&nbsp; (preassignment). Therefore, it is sufficient here to transmit the index with two bits.
+
'''(1)'''&nbsp;  For the LZW coding of the first character&nbsp; $(i = 1)$,&nbsp; only the four indices&nbsp; $I = 0$, ... , $I = 3$&nbsp; are possible for which a dictionary entry has already been made at step&nbsp; $i = 0$&nbsp; (preassignment).&nbsp; Therefore, it is sufficient here to transmit the index with two bits.
  
*In contrast, three bits are already required for step&nbsp; $i = 2$&nbsp;. If the input sequence had started with&nbsp; <b>AAA</b>&nbsp;, the LZW coding&nbsp; $I_2 = I(i = 2)$&nbsp; would have resulted in the decimal value&nbsp; $4$&nbsp;. This can no longer be represented with two bits and must be coded with three bits, just like&nbsp; $I_3$,&nbsp; $I_4$&nbsp; and&nbsp; $I_5$.
+
*In contrast, three bits are already required for step&nbsp; $i = 2$.&nbsp; If the input sequence had started with&nbsp; <b>AAA</b>&nbsp;, the LZW coding&nbsp; $I_2 = I(i = 2)$&nbsp; would have resulted in the decimal value&nbsp; $4$.&nbsp; This can no longer be represented with two bits and must be coded with three bits, just like&nbsp; $I_3$,&nbsp; $I_4$&nbsp; and&nbsp; $I_5$.
  
 
*The given input string:$$\boldsymbol{\rm 00 000 001 010 100 0001}\hspace{0.05cm}...$$
 
*The given input string:$$\boldsymbol{\rm 00 000 001 010 100 0001}\hspace{0.05cm}...$$
 
:must therefore be divided by the decoder as follows:
 
:must therefore be divided by the decoder as follows:
:$$\boldsymbol{\rm 00 | 000 |001 |010 |100 |0001|}\hspace{0.05cm}...$$
+
:$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001|}\hspace{0.05cm}...$$
  
 
*The decoder results of the first four steps are therefore:
 
*The decoder results of the first four steps are therefore:
Line 104: Line 104:
 
'''(2)'''&nbsp; If one takes into account that
 
'''(2)'''&nbsp; If one takes into account that
 
* for&nbsp; $i = 1$&nbsp; two bits are used,
 
* for&nbsp; $i = 1$&nbsp; two bits are used,
* for&nbsp; $2 &#8804; i &#8804; 5$&nbsp; three bits,
+
* for&nbsp; $2 &#8804; i &#8804; 5$&nbsp; three bits are used,
* for&nbsp; $6 &#8804; i &#8804; 19$&nbsp; four bits,
+
* for&nbsp; $6 &#8804; i &#8804; 19$&nbsp; four bits are used,
* for&nbsp; $14 &#8804; i &#8804; 29$&nbsp; five bits,
+
* for&nbsp; $14 &#8804; i &#8804; 29$&nbsp; five bits are used,
  
  
we get from the "continuous decoder input string"
+
we get from the&nbsp; "continuous decoder input string"
 
:$$\boldsymbol{\rm 00 000 001 010 100 0001 1000 0111 0001 0001 0011 1011 1011 01101 00011 01001}$$
 
:$$\boldsymbol{\rm 00 000 001 010 100 0001 1000 0111 0001 0001 0011 1011 1011 01101 00011 01001}$$
  
to the "divided decoder input string" $($designated&nbsp;  $I_1$, ... ,&nbsp; $I_{16})$:
+
to the&nbsp; "divided decoder input string"&nbsp; $($designated&nbsp;  $I_1$, ... ,&nbsp; $I_{16})$:
 
:$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001 |1000 |0111 | 0001 |0001 |0011  |1011 |1011 |01101 |00011 |01001}
 
:$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001 |1000 |0111 | 0001 |0001 |0011  |1011 |1011 |01101 |00011 |01001}
 
\hspace{0.1cm}.$$
 
\hspace{0.1cm}.$$
  
Damit lauten die gesuchten Ergebnisse für die Decodierschritte&nbsp; $i = 5$, ... ,&nbsp; $i = 8$:
+
Thus the results sought for the decoding steps&nbsp; $i = 5$, ... ,&nbsp; $i = 8$:
 
* $I_5 =$ <b>100</b> (binary) &nbsp; &nbsp;&nbsp;<u> = 4</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>AA</b>,
 
* $I_5 =$ <b>100</b> (binary) &nbsp; &nbsp;&nbsp;<u> = 4</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>AA</b>,
 
* $I_6 =$ <b>0001</b> (binary) &nbsp;&nbsp;<u> = 1</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>,
 
* $I_6 =$ <b>0001</b> (binary) &nbsp;&nbsp;<u> = 1</u> (decimal) &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <b>B</b>,
Line 132: Line 132:
  
  
'''(4)'''&nbsp; Correct are <u>statements 1 and 4</u>. Reason:
+
'''(4)'''&nbsp; Correct are the<u>statements 1 and 4</u>.&nbsp; Reason:
  
* The new incoming index is&nbsp; $18$&nbsp; (decimal) and in the dictionary the entry&nbsp; <b>DB</b>&nbsp; is found under the index &nbsp; $I = 18$&nbsp;.
+
* The new incoming index is&nbsp; $18$&nbsp; (decimal) and in the dictionary the entry&nbsp; "<b>DB</b>"&nbsp; is found under the index&nbsp; $I = 18$.
 
* For the decoding step&nbsp; $i = 17$&nbsp;, the line&nbsp; $I = 19$&nbsp; is entered into the dictionary.
 
* For the decoding step&nbsp; $i = 17$&nbsp;, the line&nbsp; $I = 19$&nbsp; is entered into the dictionary.
* The entry&nbsp; <b>BAD</b>&nbsp; is composed of the decoding result from step&nbsp; $i = 16$&nbsp; $($<b>BA</b>$)$&nbsp; and the first character&nbsp; $($<b>D</b>$)$&nbsp; of the new result&nbsp; <b>DB</b>.
+
* The entry&nbsp; "<b>BAD</b>"&nbsp; is composed of the decoding result from step&nbsp; $i = 16$&nbsp; $($<b>BA</b>$)$&nbsp; and the first character&nbsp; $($<b>D</b>$)$&nbsp; of the new result&nbsp; "<b>DB</b>".
  
  
Line 143: Line 143:
 
* Two bits are sufficient for the first phrase.
 
* Two bits are sufficient for the first phrase.
 
* For the phrases&nbsp; $I_2$, ... , $I_5$&nbsp; three bits are needed.
 
* For the phrases&nbsp; $I_2$, ... , $I_5$&nbsp; three bits are needed.
* For the phrasesn&nbsp; $I_6$, ... , $I_{13}$&nbsp; require four bits.
+
* The phrases&nbsp; $I_6$, ... , $I_{13}$&nbsp; require four bits.
* For the phrases&nbsp; $I_{14}$, ... , $I_{29}$&nbsp; require five bits.
+
* The phrases&nbsp; $I_{14}$, ... , $I_{29}$&nbsp; require five bits.
* For the phrases&nbsp; $I_{30}$, ... , $I_{58}$&nbsp; require six bits.
+
* The phrases&nbsp; $I_{30}$, ... , $I_{58}$&nbsp; require six bits.
  
 
*This gives the total number of bits:
 
*This gives the total number of bits:
:$$N_{\rm Bit} = 1 \cdot 2 + 4 \cdot 3 + 8 \cdot 4 + 16 \cdot 5 + 29 \cdot 6 = 300 \hspace{0.05cm}.$$
+
:$$N_{\rm bits} = 1 \cdot 2 + 4 \cdot 3 + 8 \cdot 4 + 16 \cdot 5 + 29 \cdot 6 = 300 \hspace{0.05cm}.$$
  
 
*With a uniform bit length (six bits for each index),&nbsp; $58 &middot; 6 = 348$&nbsp; bits (i.e. more) would be required &nbsp; &#8658; &nbsp; statement 2 is wrong in principle.
 
*With a uniform bit length (six bits for each index),&nbsp; $58 &middot; 6 = 348$&nbsp; bits (i.e. more) would be required &nbsp; &#8658; &nbsp; statement 2 is wrong in principle.
Line 154: Line 154:
  
 
Now to the third statement, which is also not true:
 
Now to the third statement, which is also not true:
* With the uncoded transmission of&nbsp; $N = 150$&nbsp; characters from the symbol set&nbsp; $\{$ <b>A</b>, \ <b>B</b>, \ <b>C</b>, \ <b>D</b> $\}$&nbsp; one would need exactly&nbsp; $300$&nbsp; bits. With LZW one certainly needs more bits if the source is redundancy-free&nbsp; characters equally likely and statistically independent).
+
* With the uncoded transmission of&nbsp; $N = 150$&nbsp; characters from the symbol set&nbsp; $\{$ <b>A</b>,&nbsp; <b>B</b>,&nbsp; <b>C</b>,&nbsp; <b>D</b> $\}$&nbsp; one would need exactly&nbsp; $300$&nbsp; bits.&nbsp; With LZW one certainly needs more bits if the source is redundancy-free &nbsp; &rArr; &nbsp; characters equally likely and statistically independent.
* With redundant source with (for example)&nbsp; $H = 1.6$ bit/source symbol, one can limit the number of bits to&nbsp; $N_{\rm Bit} = N \cdot H$&nbsp;  , provided it is a very large file&nbsp; $(N \to \infty)$.
+
* With a redundant source with (e.g.)&nbsp; $H = 1.6$&nbsp; bit/source symbol, one can limit the number of bits to&nbsp; $N_{\rm bits} = N \cdot H$,&nbsp;  provided it is a very large file&nbsp; $(N \to \infty)$.
* With a rather small file &ndash; as here only with&nbsp; $N = 150$&nbsp; bit &ndash; it is not possible to say whether the number of bits&nbsp; $N_{\rm Bit}$&nbsp; will be less than, equal to or greater than&nbsp; $150 &middot; 1.6 = 240$&nbsp;.  
+
* With a rather small file &ndash; as here with only&nbsp; $N = 150$&nbsp; bits &ndash; it is not possible to say whether &nbsp; $N_{\rm bits}$&nbsp; will be less than, equal to or greater than&nbsp; $150 &middot; 1.6 = 240$.  
*&nbsp; $N_{\rm Bit} > 300$&nbsp; is also quite possible.&nbsp; In that case, it is better to do without the "Lempel&ndash;Ziv&ndash;compression".
+
*Also quite possible is&nbsp; $N_{\rm bits} > 300$.&nbsp; In that case, it is better to do without the&nbsp; "Lempel&ndash;Ziv compression".
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  

Latest revision as of 14:11, 10 August 2021

LZW dictionary for Encoding/Decoding

The compression algorithm  "LZW"  – named after the inventors  Abraham LempelJacob Ziv  and  Terry Welch  – works like  "LZ78"  with a global dictionary.   In this case, all possible characters – in the example  ABCD  – are preassigned at the beginning and are successively expanded during encoding.

During decoding, exactly the same dictionary is created, only the same entry with index  $I$  is made one step later than during encoding.  To designate the decoding steps, we use the variable  $i$.

Here are some more notes on LZW encoding and decoding:

  • During encoding, at each time  $i$,  a search is made in the dictionary for the longest possible character string that matches the input string currently present.
  • The dictionary index  $I_i$  found is always transmitted in binary form.  At the same time,  $W(I_i) + Z$  is entered into the dictionary under the next free index.
  • Here,  $W(I_i)$  denotes a character or a character string, and  $Z$  is the first character of the upcoming input string (i.e. also a character), which is no longer considered in  $W(I_i)$ .
  • With  $M = 4$  possible characters, the first index  $I_1$  is transmitted with two bits, the indices  $I_2$, ... , $I_5$  with three bits, the next eight indices with four bits, then 16 indices with five bits, etc. The reason for this bit-saving measure can be found in the solution to this task.


After coding in the manner described here over  $16$  coding steps, the following binary sequence of length  $N_{\rm bit} = 61$:

$$\boldsymbol{\rm 00 \ 000 \ 001 \ 010 \ 100 \ 0001 \ 1000 \ 0111 \ 0001 \ 0001 \ 0011 \ 1011 \ 1011 \ 01101 \ 00011 \ 01001}\hspace{0.05cm}.$$

The task of the decoder  (more precisely:  your task)  is now,

  • to reconstruct from this binary sequence the indices  $I_1$, ... , $I_{16}$  taking into account the different number of bits  (description of the 16 decoding results by decimal numbers),
  • then read out the corresponding characters or character sequences from the dictionary according to the indices and finally – with one step delay – generate the new dictionary entry.





Hints:

LZ77 - the basic form of Lempel-Ziv algorithms,
Lempel-Ziv coding with variable index bit length,
Decoding of the LZW algorithm.


Questions

1

Which indices are available for the first four steps  $(i=1$, ... , $i=4)$  for decoding.  Enter all  $I_i = I(i)$  as decimal numbers.

$I_1 \ = \ $

$I_2 \ = \ $

$I_3 \ = \ $

$I_4 \ = \ $

2

Continue the subdivision of the decoder input string to the end.  Which indices result from the decoding steps  $i=5$, ... , $i=8$?

$I_5 \ = \ $

$I_6 \ = \ $

$I_7 \ = \ $

$I_8 \ = \ $

3

What is the output string after  $i = 16$  decoding steps?

AABCAABAABCAABCABBDBBDCDBA,
AABCAABAABCABBDCABCABBDDBA,
AABCAABAABCABDBBABCCDBAABDCDBA.

4

The next index is:   $I_{17} = 10010$ (binary) $= 18$ (decimal).  Which of the following statements are true?

The decoder output to step  $i = 17$  is  "DB".
The decoder output to step  $i = 17$  is  "BAD".
New dictionary entry  "DB"  in line  $I = 19$.
New dictionary entry  "BAD"  in line  $I = 19$.

5

Let the decoder input string consist of  $N_{\rm bit} = 300$  binary characters (bits).  Which of the following statements is then certainly true?

$58$  LZW phrases with variable bit length were transmitted.
With a uniform index bit length, fewer bits would be required.
A file with  $N =150$  quaternary characters was transmitted.


Solution

(1)  For the LZW coding of the first character  $(i = 1)$,  only the four indices  $I = 0$, ... , $I = 3$  are possible for which a dictionary entry has already been made at step  $i = 0$  (preassignment).  Therefore, it is sufficient here to transmit the index with two bits.

  • In contrast, three bits are already required for step  $i = 2$.  If the input sequence had started with  AAA , the LZW coding  $I_2 = I(i = 2)$  would have resulted in the decimal value  $4$.  This can no longer be represented with two bits and must be coded with three bits, just like  $I_3$,  $I_4$  and  $I_5$.
  • The given input string:$$\boldsymbol{\rm 00 000 001 010 100 0001}\hspace{0.05cm}...$$
must therefore be divided by the decoder as follows:
$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001|}\hspace{0.05cm}...$$
  • The decoder results of the first four steps are therefore:
  • $I_1 =$ 00 (binary)      = 0 (decimal)   ⇒   A,
  • $I_2 =$ 000 (binary)    = 0 (decimal)   ⇒   A,
  • $I_3 =$ 001 (binary)    = 1 (decimal)   ⇒   B,
  • $I_4 =$ 010 (binary)    = 2 (decimal)   ⇒   C.


(2)  If one takes into account that

  • for  $i = 1$  two bits are used,
  • for  $2 ≤ i ≤ 5$  three bits are used,
  • for  $6 ≤ i ≤ 19$  four bits are used,
  • for  $14 ≤ i ≤ 29$  five bits are used,


we get from the  "continuous decoder input string"

$$\boldsymbol{\rm 00 000 001 010 100 0001 1000 0111 0001 0001 0011 1011 1011 01101 00011 01001}$$

to the  "divided decoder input string"  $($designated  $I_1$, ... ,  $I_{16})$:

$$\boldsymbol{\rm 00 |000 |001 |010 |100 |0001 |1000 |0111 | 0001 |0001 |0011 |1011 |1011 |01101 |00011 |01001} \hspace{0.1cm}.$$

Thus the results sought for the decoding steps  $i = 5$, ... ,  $i = 8$:

  • $I_5 =$ 100 (binary)      = 4 (decimal)   ⇒   AA,
  • $I_6 =$ 0001 (binary)    = 1 (decimal)   ⇒   B,
  • $I_7 =$ 1000 (binary)    = 8 (decimal)   ⇒   AAB,
  • $I_8 =$ 0111 (binary)    = 7 (decimal)   ⇒   CA.


(3) Statement 2 is correct.

  • The following decoding results are obtained (in abbreviated form):
  $I_{9} = 1$   ⇒   B, $\hspace{0.5cm}I_{10} = 1$   ⇒  B,$\hspace{0.5cm}I_{11} = 3$   ⇒   D,$\hspace{0.5cm}I_{12} = 11$   ⇒   CAB,
  $I_{13} = 11$   ⇒   CAB,$\hspace{0.5cm}I_{14} = 13$   ⇒   BD,$\hspace{0.5cm}I_{15} = 3$   ⇒   D,$\hspace{0.5cm}I_{16} = 9$   ⇒   BA.


(4)  Correct are thestatements 1 and 4.  Reason:

  • The new incoming index is  $18$  (decimal) and in the dictionary the entry  "DB"  is found under the index  $I = 18$.
  • For the decoding step  $i = 17$ , the line  $I = 19$  is entered into the dictionary.
  • The entry  "BAD"  is composed of the decoding result from step  $i = 16$  $($BA$)$  and the first character  $($D$)$  of the new result  "DB".


(5)  Only statement 1 is correct here:

  • Two bits are sufficient for the first phrase.
  • For the phrases  $I_2$, ... , $I_5$  three bits are needed.
  • The phrases  $I_6$, ... , $I_{13}$  require four bits.
  • The phrases  $I_{14}$, ... , $I_{29}$  require five bits.
  • The phrases  $I_{30}$, ... , $I_{58}$  require six bits.
  • This gives the total number of bits:
$$N_{\rm bits} = 1 \cdot 2 + 4 \cdot 3 + 8 \cdot 4 + 16 \cdot 5 + 29 \cdot 6 = 300 \hspace{0.05cm}.$$
  • With a uniform bit length (six bits for each index),  $58 · 6 = 348$  bits (i.e. more) would be required   ⇒   statement 2 is wrong in principle.


Now to the third statement, which is also not true:

  • With the uncoded transmission of  $N = 150$  characters from the symbol set  $\{$ ABCD $\}$  one would need exactly  $300$  bits.  With LZW one certainly needs more bits if the source is redundancy-free   ⇒   characters equally likely and statistically independent.
  • With a redundant source with (e.g.)  $H = 1.6$  bit/source symbol, one can limit the number of bits to  $N_{\rm bits} = N \cdot H$,  provided it is a very large file  $(N \to \infty)$.
  • With a rather small file – as here with only  $N = 150$  bits – it is not possible to say whether   $N_{\rm bits}$  will be less than, equal to or greater than  $150 · 1.6 = 240$.
  • Also quite possible is  $N_{\rm bits} > 300$.  In that case, it is better to do without the  "Lempel–Ziv compression".