Difference between revisions of "Aufgaben:Exercise 2.4Z: LZW Coding and Decoding again"
(25 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
− | {{quiz-Header|Buchseite= | + | {{quiz-Header|Buchseite=Information_Theory/Compression_According_to_Lempel,_Ziv_and_Welch |
}} | }} | ||
− | [[File: | + | [[File:EN_Inf_Z2.4_vers2.png|right|frame|Two snapshots of the LZW dictionary ]] |
− | + | The graph shows snapshots of the dictionary created during LZW coding of the input symbol sequence. | |
+ | *The upper graph is for input symbol sequence <b>ABABABBAA</b>. | ||
+ | *The lower dictionary is created during LZW coding of the sequence <b>ABABABABA</b>. | ||
− | |||
− | + | In both cases, it is assumed that no characters other than <b>A</b> and <b>B</b> can occur at later times. | |
− | + | In <u>LZW decoding</u>, the same dictionaries are created, but then the dictionary entries occur one step later. In subtask '''(3)''' the question is asked for which coding step or for which decoding step the two snapshots shown are valid. | |
− | |||
− | |||
− | |||
+ | In <u>LZW coding</u>, an index $I$ is selected for each coding step $i$ and transmitted (binary). The character pair <b>AB</b> is represented by the index $I = 2$ in the two dictionaries. Here we consider the index $I$ as a decimal number and leave the binary representation out of consideration for this task. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | With <u>LZW decoding</u>, a character or character sequence is generated in the same way from each index $I$ with the help of the dictionary, for example $I = 1$ leads to the character <b>B</b> and $I = 2$ to the character pair <b>AB</b>. | ||
− | + | If a dictionary entry with the desired index $I$ is actually found, the decoding runs smoothly. However, this is not always the case: | |
− | + | * If a new index $I$ is entered during encoding at step $i$ and this $I$ is at the same time the encoding result of the step, this index is not yet occupied in the dictionary at decoding step $i$ . The reason for this is that with the decoder the entries are made one step later than with the coder. | |
− | + | * In the case of a "binary" input sequence $($all characters are <b>A</b> or <b>B</b>$)$, a special rule must always be applied in LZW decoding if the entry with index $I = i$ was made in coding step $i$ . | |
− | * | ||
− | * | ||
− | === | + | This special rule shall be illustrated by an example: |
+ | * For step $i$ there is no entry in the decoder dictionary matching index $I$ . | ||
+ | * We assume that in the previous step $(i- 1)$ the decoding result was <b>ABBABA</b> . | ||
+ | * Then we add the first character of the sequence to this string. Here: <b>ABBABAA</b>. | ||
+ | * Then one enters the sequence <b>ABBABAA</b> into the dictionary under index $I$ . | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Hints: | ||
+ | *The exercise belongs to the chapter [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch|Compression according to Lempel, Ziv and Welch]]. | ||
+ | *In particular, reference is made to the pages | ||
+ | :: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#LZ77_-_the_basic_form_of_the_Lempel-Ziv_algorithms|LZ77 - the basic form of Lempel-Ziv algorithms]], | ||
+ | :: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Lempel-Ziv_coding_with_variable_index_bit_length|Lempel-Ziv coding with variable index bit length]], | ||
+ | :: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Decoding_of_the_LZW_algorithm|Decoding of the LZW algorithm]]. | ||
+ | |||
+ | *Also, when solving this task, note that the LZW algorithm does not assume an empty dictionary. | ||
+ | *Rather, the indices $I = 0$ to $I = M- 1$ contain all $M$ permissible characters of the alphabet. | ||
+ | |||
+ | ===Questions=== | ||
<quiz display=simple> | <quiz display=simple> | ||
− | { | + | {Encode the input sequence <b>ABABABBAA</b> according to the <u>above diagram</u>. Which indices $I_i$ result for the steps $i=1$, ... , $i=5$? |
|type="{}"} | |type="{}"} | ||
− | $ | + | $I_1 \ = \ $ { 0. } |
− | $ | + | $I_2 \ = \ $ { 1 } |
− | $ | + | $I_3 \ = \ $ { 2 } |
− | $ | + | $I_4 \ = \ $ { 2 } |
− | $ | + | $I_5 \ = \ $ { 3 } |
− | { | + | {Now encode the input sequence <b>ABABABABA</b> according to the <u>diagram below</u>. Specify the indices $I_i$ for the steps $i=4$ and $i=5$ . |
|type="{}"} | |type="{}"} | ||
− | $ | + | $I_4 \ = \ $ { 4 } |
− | $ | + | $I_5 \ = \ $ { 3 } |
− | { | + | {For which step $i$ does the snapshot of the dictionary shown on the exercise description page apply with respect to |
|type="{}"} | |type="{}"} | ||
− | $\text{ | + | $\text{Encoding:} \ \ i \ = \ $ { 4 } |
− | $\text{ | + | $\text{Decoding:} \ \ i \ = \ $ { 5 } |
− | { | + | {When does one have to resort to the decoding special case rule? |
|type="[]"} | |type="[]"} | ||
− | - | + | - When decoding <b>ABABABBAA</b> in step $i = 4$. |
− | + | + | + When decoding <b>ABABABABA</b> in step $i = 4$. |
− | - | + | - When decoding <b>ABABABABA</b> in step $i = 5$. |
Line 66: | Line 82: | ||
</quiz> | </quiz> | ||
− | === | + | ===Solution=== |
{{ML-Kopf}} | {{ML-Kopf}} | ||
− | + | '''(1)''' We use $W(I)$ to denote a field (array), that describes the dictionary and whose elements contain characters or strings. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | *The encoding of <b>ABABABBAA</b> proceeds as follows: | |
+ | :: $i = 1$: <b>A</b> → $\underline{I=0}$, $W(I = 2) =$ <b>AB</b>, | ||
+ | :: $i = 2$: <b>B</b> → $\underline{I=1}$, $W(I = 3) =$ <b>BA</b>, | ||
+ | :: $i = 3$: <b>AB</b> → $\underline{I=2}$, $W(I = 4) =$ <b>ABA</b>, | ||
+ | :: $i = 4$: <b>AB</b> → $\underline{I=2}$, $W(I = 5) =$ <b>ABB</b>, | ||
+ | :: $i = 5$: <b>BA</b> → $\underline{I=3}$, $W(I = 6) =$ <b>BAA</b>. | ||
− | + | *Note that the last character $($<b>A</b>$)$ of the input string <b>ABABABBAA</b> at time $i = 5$ is already considered in the dictionary entry but has not yet been encoded. | |
− | |||
− | |||
− | + | '''(2)''' For the steps $i = 1$ to $i = 3$ nothing changes compared to subtask '''(1)'''. | |
+ | *Afterwards, the following applies: | ||
+ | :: $i = 4$: <b>ABA</b> → $\underline{I=4}$, $W(I = 5) =$ <b>ABAB</b>, | ||
+ | :: $i = 5$: <b>BA</b> → $\underline{I=3}$, encoding completed, no new dictionary entry possible. | ||
− | |||
− | |||
− | :< | + | '''(3)''' The comparison with above results shows: |
+ | *The dictionary of the <u>encoder</u> has the entries shown exactly after $\underline{i=4}$ encoding steps. | ||
+ | *With the <u>decoder</u>, on the other hand, there is a time delay of one step: $\underline{i=5}$. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | '''(4)''' <u>Solution suggestion 2</u> is correct: | |
+ | *The special case regulation of the decoding is necessary (in the present example) if the index $I =i$ is output in the coding step $i$ . | ||
+ | *During decoding, it then does not find the required assignment "Index → String" since the generated dictionary at time $i$ only contains entries with $I < i$ . | ||
+ | *For the sequence <b>ABABABBAA</b> ⇒ $I < i$ always applies according to subtask '''(1)''' . | ||
+ | *In contrast, the following indices would result for the sequence <b>ABABABABA</b> : | ||
+ | :$$i = 1\hspace{-0.15cm}: \hspace{0.15cm} I = 0\hspace{0.05cm}, | ||
+ | \hspace{0.5cm}i = 2\hspace{-0.15cm}: \hspace{0.15cm}I = 1\hspace{0.05cm}, | ||
+ | \hspace{0.5cm}i = 3\hspace{-0.15cm}: \hspace{0.15cm}I = 2\hspace{0.05cm}, | ||
+ | \hspace{0.5cm}\underline{i = 4\hspace{-0.15cm}: \hspace{0.15cm}I = 4}\hspace{0.05cm}, | ||
+ | \hspace{0.5cm}i = 5\hspace{-0.15cm}: \hspace{0.15cm}I = 3\hspace{0.05cm}. $$ | ||
− | + | Here is a summary of the entire decoding process of <b>ABABABABA</b>: | |
− | + | *The pre-assignment of the dictionary includes $(I=0$: <b>A</b>$)$ and $(I=1$: <b>B</b>$)$. | |
− | : | + | *Then with the dictionary array $W(I)$: |
+ | :: $i = 1$: Decoding $I=0$ → <b>A</b>, | ||
+ | :: $i = 2$: Decoding $I=1$ → <b>B</b>, $W(I = 2) =$ <b>AB</b>, | ||
+ | :: $i = 3$: Decoding $I=2$ → <b>AB</b>, $W(I = 3) =$ <b>BA</b>, | ||
+ | :: $i = 4$: An entry with index $I = 4$ is not present ⇒ '''Special case rule''': <br> One takes the last decoding result $($here <b>AB</b>$)$ and adds the first character of this sequence at the end ⇒ <b>ABA</b>. <br> Then <b>ABA</b> is stored in the dictionary under index $I = 4$ . | ||
+ | :: $i = 5$: Decoding $I=3$ → <b>BA</b>. '''End of the decoder input sequence'''. | ||
{{ML-Fuß}} | {{ML-Fuß}} | ||
− | [[Category: | + | [[Category:Information Theory: Exercises|^2.2 Lempel-Ziv-Welch Compression^]] |
Latest revision as of 10:34, 31 August 2021
The graph shows snapshots of the dictionary created during LZW coding of the input symbol sequence.
- The upper graph is for input symbol sequence ABABABBAA.
- The lower dictionary is created during LZW coding of the sequence ABABABABA.
In both cases, it is assumed that no characters other than A and B can occur at later times.
In LZW decoding, the same dictionaries are created, but then the dictionary entries occur one step later. In subtask (3) the question is asked for which coding step or for which decoding step the two snapshots shown are valid.
In LZW coding, an index $I$ is selected for each coding step $i$ and transmitted (binary). The character pair AB is represented by the index $I = 2$ in the two dictionaries. Here we consider the index $I$ as a decimal number and leave the binary representation out of consideration for this task.
With LZW decoding, a character or character sequence is generated in the same way from each index $I$ with the help of the dictionary, for example $I = 1$ leads to the character B and $I = 2$ to the character pair AB.
If a dictionary entry with the desired index $I$ is actually found, the decoding runs smoothly. However, this is not always the case:
- If a new index $I$ is entered during encoding at step $i$ and this $I$ is at the same time the encoding result of the step, this index is not yet occupied in the dictionary at decoding step $i$ . The reason for this is that with the decoder the entries are made one step later than with the coder.
- In the case of a "binary" input sequence $($all characters are A or B$)$, a special rule must always be applied in LZW decoding if the entry with index $I = i$ was made in coding step $i$ .
This special rule shall be illustrated by an example:
- For step $i$ there is no entry in the decoder dictionary matching index $I$ .
- We assume that in the previous step $(i- 1)$ the decoding result was ABBABA .
- Then we add the first character of the sequence to this string. Here: ABBABAA.
- Then one enters the sequence ABBABAA into the dictionary under index $I$ .
Hints:
- The exercise belongs to the chapter Compression according to Lempel, Ziv and Welch.
- In particular, reference is made to the pages
- Also, when solving this task, note that the LZW algorithm does not assume an empty dictionary.
- Rather, the indices $I = 0$ to $I = M- 1$ contain all $M$ permissible characters of the alphabet.
Questions
Solution
- The encoding of ABABABBAA proceeds as follows:
- $i = 1$: A → $\underline{I=0}$, $W(I = 2) =$ AB,
- $i = 2$: B → $\underline{I=1}$, $W(I = 3) =$ BA,
- $i = 3$: AB → $\underline{I=2}$, $W(I = 4) =$ ABA,
- $i = 4$: AB → $\underline{I=2}$, $W(I = 5) =$ ABB,
- $i = 5$: BA → $\underline{I=3}$, $W(I = 6) =$ BAA.
- Note that the last character $($A$)$ of the input string ABABABBAA at time $i = 5$ is already considered in the dictionary entry but has not yet been encoded.
(2) For the steps $i = 1$ to $i = 3$ nothing changes compared to subtask (1).
- Afterwards, the following applies:
- $i = 4$: ABA → $\underline{I=4}$, $W(I = 5) =$ ABAB,
- $i = 5$: BA → $\underline{I=3}$, encoding completed, no new dictionary entry possible.
(3) The comparison with above results shows:
- The dictionary of the encoder has the entries shown exactly after $\underline{i=4}$ encoding steps.
- With the decoder, on the other hand, there is a time delay of one step: $\underline{i=5}$.
(4) Solution suggestion 2 is correct:
- The special case regulation of the decoding is necessary (in the present example) if the index $I =i$ is output in the coding step $i$ .
- During decoding, it then does not find the required assignment "Index → String" since the generated dictionary at time $i$ only contains entries with $I < i$ .
- For the sequence ABABABBAA ⇒ $I < i$ always applies according to subtask (1) .
- In contrast, the following indices would result for the sequence ABABABABA :
- $$i = 1\hspace{-0.15cm}: \hspace{0.15cm} I = 0\hspace{0.05cm}, \hspace{0.5cm}i = 2\hspace{-0.15cm}: \hspace{0.15cm}I = 1\hspace{0.05cm}, \hspace{0.5cm}i = 3\hspace{-0.15cm}: \hspace{0.15cm}I = 2\hspace{0.05cm}, \hspace{0.5cm}\underline{i = 4\hspace{-0.15cm}: \hspace{0.15cm}I = 4}\hspace{0.05cm}, \hspace{0.5cm}i = 5\hspace{-0.15cm}: \hspace{0.15cm}I = 3\hspace{0.05cm}. $$
Here is a summary of the entire decoding process of ABABABABA:
- The pre-assignment of the dictionary includes $(I=0$: A$)$ and $(I=1$: B$)$.
- Then with the dictionary array $W(I)$:
- $i = 1$: Decoding $I=0$ → A,
- $i = 2$: Decoding $I=1$ → B, $W(I = 2) =$ AB,
- $i = 3$: Decoding $I=2$ → AB, $W(I = 3) =$ BA,
- $i = 4$: An entry with index $I = 4$ is not present ⇒ Special case rule:
One takes the last decoding result $($here AB$)$ and adds the first character of this sequence at the end ⇒ ABA.
Then ABA is stored in the dictionary under index $I = 4$ . - $i = 5$: Decoding $I=3$ → BA. End of the decoder input sequence.