Difference between revisions of "Exercise 2.12: Run–Length Coding and Run–Length Limited Coding"
Line 3: | Line 3: | ||
}} | }} | ||
− | [[File:EN_Inf_A_2_13.png|right|frame| | + | [[File:EN_Inf_A_2_13.png|right|frame|Table on Run-Length Coding]] |
− | + | We consider a binary source with the symbol set $\rm A$ and $\rm B$, where $\rm B$ however, occurs only very rarely. | |
− | * | + | * Without source coding, exactly one bit would be needed per source symbol, and accordingly, for a source symbol sequence of length $N$ would also apply to the code bit sequence $N_\text{Bit} = N$ . |
− | * | + | * Entropy coding makes little sense here without further measures (example: combining several symbols into a tuple) wbecause of the unfavourable symbol probabilities. |
− | * | + | * The remedy is [[Information_Theory/Weitere_Quellencodierverfahren#Run.E2.80.93Length_coding|Run-Length Coding]] $(\rm RLC)$, which is described in the theory section under the link mentioned. For example, for the symbol sequence $\rm ABAABAAAABBAAB\text{...}$ the corresponding output of ''Run–Length Coding'': $ 2; \ 3; \ 5; \ 1; \ 3; \text{...}$ |
− | * | + | * Of course, the lengths $L_1 = 2$, $L_2 = 3$, ... of the individual substrings, each separated by $\rm B$ , must be represented in binary before transmission. If one uses $D = 3$ (bit) for all $L_i$ one obtains the RLC binary sequence |
:$$010\hspace{0.05cm}\text{'}\hspace{0.05cm}011\hspace{0.05cm}\text{'}\hspace{0.05cm}101\hspace{0.05cm}\text{'}\hspace{0.05cm}001\hspace{0.05cm}\text{'}\hspace{0.05cm}011\hspace{0.05cm}\text{'}\hspace{0.05cm}\text{...}$$ | :$$010\hspace{0.05cm}\text{'}\hspace{0.05cm}011\hspace{0.05cm}\text{'}\hspace{0.05cm}101\hspace{0.05cm}\text{'}\hspace{0.05cm}001\hspace{0.05cm}\text{'}\hspace{0.05cm}011\hspace{0.05cm}\text{'}\hspace{0.05cm}\text{...}$$ | ||
− | + | The graph shows the RLC result to be analysed. Columns 2 and 3 show the substring lengths $L_i$ in binary and decimal, respectively, and column 4 shows them in cumulative form (values from column 3 added up). | |
− | * | + | *One problem of ''Run-Length Coding'' $\rm (RLC)$ is the unlimited range of values of the quantities $L_i$. With $D = 3$ , no value $L_i > 7$ can be represented and with $D = 2$ , the restriction is $1 \le L_i \le 3$. |
− | * | + | *The problem is circumvented with <i>Run–Length Limited Coding</i> $\rm (RLLC)$. If a value is $L_i \ge 2^D$, one replaces $L_i$ with a special character <b>S</b> and the difference $L_i - 2^D +1$. With the RLLC decoder, this special character <b>S</b> is expanded again. |
Line 25: | Line 25: | ||
− | + | Hints: | |
− | *Die Aufgabe gehört zum Kapitel [[Information_Theory/Weitere_Quellencodierverfahren| | + | *Die Aufgabe gehört zum Kapitel [[Information_Theory/Weitere_Quellencodierverfahren|Other source coding methods]]. |
− | * | + | *In particular, reference is made to the page [[Information_Theory/Weitere_Quellencodierverfahren#Run.E2.80.93Length_coding|Run-Length Coding]]. |
{{GraueBox| | {{GraueBox| | ||
− | TEXT=$\text{RLLC | + | TEXT=$\text{RLLC Example}$: We again assume the above sequence and the parameter $D = 2$ : |
− | * | + | * Source symbol sequence: $\rm ABAABAAAABBAAB$... |
− | * RLC | + | * RLC decimal sequence: '''2; 3; 5; 1; 3;''' ... |
− | * RLLC | + | * RLLC decimal sequence: '''2; 3; '''<font color="#cc0000"><span style="font-weight: bold;"> S;</span></font>''' 2; 1; 3;''' ... |
− | * RLLC | + | * RLLC binary sequence: '''01′11′''' <font color="#cc0000"><span style="font-weight: bold;">00′</span></font>'''10′01′11′'''... |
− | + | You can see: | |
− | * | + | *The special character <b>S</b> is binary-coded here as <b>00</b> . This is only an example – it does not have to be like this. |
− | * | + | *Since with $D = 2$ for all real RLC values $1 \le L_i \le 3$ , the decoder recognises the special character '''00'''. |
− | * | + | *It replaces this again with $2^D -1$ (three in the example) $\rm A$–symbols.}} |
− | === | + | ===Questions=== |
<quiz display=simple> | <quiz display=simple> | ||
− | { | + | {How many bits would be needed <u>without source coding</u> , i.e. with the assignment $\rm A$ → <b>0</b> and $\rm B$ → <b>1</b>? |
|type="{}"} | |type="{}"} | ||
$N_\text{Bit} \ = \ $ { 1250 } | $N_\text{Bit} \ = \ $ { 1250 } | ||
− | { | + | {What is the relative frequency of symbol $\rm B$? |
|type="{}"} | |type="{}"} | ||
$h_{\rm B}\ = \ $ { 2 3% } $\ \%$ | $h_{\rm B}\ = \ $ { 2 3% } $\ \%$ | ||
− | { | + | {How many bits are needed for <u>Run–Length Coding</U> $\rm (RLC)$ according to the given table with eight bits per codeword $(D = 8)$? |
|type="{}"} | |type="{}"} | ||
$N_\text{Bit} \ = \ $ { 200 } | $N_\text{Bit} \ = \ $ { 200 } | ||
− | { | + | {Is <u>Run–Length Coding</u> with seven bit code words $(D = 7)$ possible here? |
|type="()"} | |type="()"} | ||
− | - | + | - Yes. |
− | + | + | + No. |
− | { | + | {How many bits are needed for <u>Run–Length Limited Coding</U> $\rm (RLLC)$ with seven bits per code word $(D = 7)$? |
|type="{}"} | |type="{}"} | ||
$N_\text{Bit} \ = \ $ { 182 } | $N_\text{Bit} \ = \ $ { 182 } | ||
− | { | + | {How many bits are needed for <u>Run–Length Limited Coding</u> $\rm (RLLC)$ with six bits per codeword $(D = 6)$? |
|type="{}"} | |type="{}"} | ||
$N_\text{Bit} \ = \ ${ 204 } | $N_\text{Bit} \ = \ ${ 204 } | ||
Line 83: | Line 83: | ||
</quiz> | </quiz> | ||
− | === | + | ===Solutions=== |
{{ML-Kopf}} | {{ML-Kopf}} | ||
− | '''(1)''' | + | '''(1)''' The binary sequence consists of $N = 1250$ binary symbols (can be read from the last column in the table). This means that the same number of bits is needed without coding: |
:$$N_\text{Bit}\hspace{0.15cm}\underline{= 1250}.$$ | :$$N_\text{Bit}\hspace{0.15cm}\underline{= 1250}.$$ | ||
− | '''(2)''' | + | '''(2)''' The entire symbol sequence of length $N = 1250$ contains $N_{\rm B} = 25$ symbols ${\rm B}$ and thus $N_{\rm A} = 1225$ symbols ${\rm A}$. |
− | * | + | *The number $N_{\rm B}$ der Symbole ${\rm B}$ ist dabei gleich der Zeilenzahl in der vorne angegebenen Tabelle. |
*Damit gilt für die relative Häufigkeit von ${\rm B}$: | *Damit gilt für die relative Häufigkeit von ${\rm B}$: | ||
:$$h_{\rm B} = \frac{N_{\rm B}}{N} = \frac{25}{1250} \hspace{0.15cm}\underline{= 0.02} = 2\%\hspace{0.05cm}. $$ | :$$h_{\rm B} = \frac{N_{\rm B}}{N} = \frac{25}{1250} \hspace{0.15cm}\underline{= 0.02} = 2\%\hspace{0.05cm}. $$ |
Revision as of 20:17, 8 August 2021
We consider a binary source with the symbol set $\rm A$ and $\rm B$, where $\rm B$ however, occurs only very rarely.
- Without source coding, exactly one bit would be needed per source symbol, and accordingly, for a source symbol sequence of length $N$ would also apply to the code bit sequence $N_\text{Bit} = N$ .
- Entropy coding makes little sense here without further measures (example: combining several symbols into a tuple) wbecause of the unfavourable symbol probabilities.
- The remedy is Run-Length Coding $(\rm RLC)$, which is described in the theory section under the link mentioned. For example, for the symbol sequence $\rm ABAABAAAABBAAB\text{...}$ the corresponding output of Run–Length Coding: $ 2; \ 3; \ 5; \ 1; \ 3; \text{...}$
- Of course, the lengths $L_1 = 2$, $L_2 = 3$, ... of the individual substrings, each separated by $\rm B$ , must be represented in binary before transmission. If one uses $D = 3$ (bit) for all $L_i$ one obtains the RLC binary sequence
- $$010\hspace{0.05cm}\text{'}\hspace{0.05cm}011\hspace{0.05cm}\text{'}\hspace{0.05cm}101\hspace{0.05cm}\text{'}\hspace{0.05cm}001\hspace{0.05cm}\text{'}\hspace{0.05cm}011\hspace{0.05cm}\text{'}\hspace{0.05cm}\text{...}$$
The graph shows the RLC result to be analysed. Columns 2 and 3 show the substring lengths $L_i$ in binary and decimal, respectively, and column 4 shows them in cumulative form (values from column 3 added up).
- One problem of Run-Length Coding $\rm (RLC)$ is the unlimited range of values of the quantities $L_i$. With $D = 3$ , no value $L_i > 7$ can be represented and with $D = 2$ , the restriction is $1 \le L_i \le 3$.
- The problem is circumvented with Run–Length Limited Coding $\rm (RLLC)$. If a value is $L_i \ge 2^D$, one replaces $L_i$ with a special character S and the difference $L_i - 2^D +1$. With the RLLC decoder, this special character S is expanded again.
Hints:
- Die Aufgabe gehört zum Kapitel Other source coding methods.
- In particular, reference is made to the page Run-Length Coding.
$\text{RLLC Example}$: We again assume the above sequence and the parameter $D = 2$ :
- Source symbol sequence: $\rm ABAABAAAABBAAB$...
- RLC decimal sequence: 2; 3; 5; 1; 3; ...
- RLLC decimal sequence: 2; 3; S; 2; 1; 3; ...
- RLLC binary sequence: 01′11′ 00′10′01′11′...
You can see:
- The special character S is binary-coded here as 00 . This is only an example – it does not have to be like this.
- Since with $D = 2$ for all real RLC values $1 \le L_i \le 3$ , the decoder recognises the special character 00.
- It replaces this again with $2^D -1$ (three in the example) $\rm A$–symbols.
Questions
Solutions
- $$N_\text{Bit}\hspace{0.15cm}\underline{= 1250}.$$
(2) The entire symbol sequence of length $N = 1250$ contains $N_{\rm B} = 25$ symbols ${\rm B}$ and thus $N_{\rm A} = 1225$ symbols ${\rm A}$.
- The number $N_{\rm B}$ der Symbole ${\rm B}$ ist dabei gleich der Zeilenzahl in der vorne angegebenen Tabelle.
- Damit gilt für die relative Häufigkeit von ${\rm B}$:
- $$h_{\rm B} = \frac{N_{\rm B}}{N} = \frac{25}{1250} \hspace{0.15cm}\underline{= 0.02} = 2\%\hspace{0.05cm}. $$
(3) Wir betrachten nun Run–Length Coding $\rm (RLC)$, wobei jeder Abstand zwischen zwei ${\rm B}$–Symbolen mit acht Bit dargestellt wird $(D = 8)$.
- Damit ergibt sich mit $N_{\rm B} = 25$:
- $$N_{\rm Bit} = N_{\rm B} \cdot 8 \hspace{0.15cm}\underline{= 200} \hspace{0.05cm}.$$
(4) $\rm RLC$ mit $D = 7$ erlaubt für $L_i$ nur Werte zwischen $1$ und $2^7-1 =127$.
- Der Eintrag "226" in Zeile 19 ist aber größer ⇒ NEIN.
(5) Auch bei Run–Length Limited Coding $\rm (RLLC)$ sind für die "echten" Abstände $L_i$ mit $D = 7$ nur Werte bis $127$ zulässig.
- Der Eintrag "226" in Zeile 19 wird bei $\rm RLLC$ ersetzt durch
- Zeile 19a: S = 0000000 ⇒ Sonderzeichen, steht für "+ 127",
- Zeile 19b: 1100011 ⇒ Dezimal 99.
- Damit erhält man insgesamt $26$ Worte zu je sieben Bit:
- $$N_{\rm Bit} = 26 \cdot 7 \hspace{0.15cm}\underline{= 182} \hspace{0.05cm}.$$
(6) Nun müssen bei $\rm RLLC$ gegenüber $\rm RLC$ (siehe Tabelle) folgende Änderungen vorgenommen werden:
- Zeile 1: $122 = 1 · 63 + 59$ (ein Wort mehr),
- Zeile 6: $70 = 1 · 63 + 7$ (ein Wort mehr),
- Zeile 7: $80 = 1 · 63 + 17$ (ein Wort mehr),
- Zeile 12: $79 = 1 · 63 + 18$ (ein Wort mehr),
- Zeile 13: $93 = 1 · 63 + 30$ (ein Wort mehr),
- Zeile 19: $226 = 3 · 63 + 37$ (drei Worte mehr),
- Zeile 25: $97 = 1 · 63 + 34$ (ein Wort mehr).
Damit erhält man insgesamt $34$ Worte zu je sechs Bit:
- $$N_{\rm Bit} = 34 \cdot 6 \hspace{0.15cm}\underline{= 204} \hspace{0.05cm},$$
also ein schlechteres Ergebnis als mit sieben Bit gemäß Teilaufgabe (5).