Difference between revisions of "Aufgaben:Exercise 2.11Z: Arithmetic Coding once again"

From LNTwww
 
(27 intermediate revisions by 5 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie und Quellencodierung/Weitere Quellencodierverfahren
+
{{quiz-Header|Buchseite=Information_Theory/Further_Source_Coding_Methods
 
}}
 
}}
  
[[File:P_ID2473__Inf_A_2_12.png|right|]]
+
[[File:EN_Inf_Z_2_11_v2.png|right|frame|Preset interval ranges]]
Wir betrachten hier die arithmetische Codierung (AC). Alle notwendigen Informationen zu dieser Art von Entropiecodierung finden Sie in der Aufgabe A2.11.
+
Here we consider arithmetic coding  $(\rm AC)$.  All necessary information about this type of entropy coding can be found in  [[Aufgaben:Exercise_2.11:_Arithmetic_Coding|Exercise 2.11]].
  
Auch nebenstehende Grafik ist das Ergebnis der letzten Aufgabe. Die für die aktuelle Aufgabe wichtigen Zahlenwerte für die Codierschritte 3 und 7 sind farblich hervorgehoben:
+
The graph is also the result of Exercise 2.11.  The numerical values for coding steps 3 and 7 that are important for the current task are highlighted in colour:
 +
* The interval for  $N= 3$  $($symbol sequence  $\rm XXY)$  starts at  $B_3 = 0.343$  and goes up to  $E_3 = 0.392$.
 +
* The interval limits for  $N= 7$  $($symbol sequence  $\rm XXYXXXZ)$  are  $B_7 = 0.3564456$  and  $E_7 =0.359807$.
  
:* Das Intervall für <i>N</i> = 3 (Symbolfolge <b>XXY</b>) beginnt bei <i>B</i><sub>3</sub> = 0.343 und reicht bis zum Endwert <i>E</i><sub>3</sub> = 0.392.
 
  
:* Das Intervall für <i>N</i> = 7 (Symbolfolge <b>XXYXXXZ</b>) beginnt bei <i>B</i><sub>7</sub> = 0.3564456 und endet bei <i>E</i><sub>7</sub>&nbsp;=&nbsp;0.359807.
+
This task is only about assigning binary sequences to the selected intervals. Procedure:
  
In dieser Aufgabe geht es nur um die Zuweisung von Binärfolgen zu den ausgewählten Intervallen. Man geht wie folgt vor:
+
* The interval&nbsp; $I$&nbsp; is determined by the beginning&nbsp; $B$, the end&nbsp; $E$,&nbsp; the interval width&nbsp; ${\it \Delta} = E-B$ &nbsp;as well as the interval centre &nbsp;$M = (B+E)/2$.
 +
* The interval&nbsp; $I$&nbsp; is characterised by the binary representation&nbsp; (with limited resolution)&nbsp; of any real number value&nbsp; $r \in I$.&nbsp; For example, one chooses &nbsp;$r \approx M$.
 +
* The required number of bits results from the interval width according to the following equation&nbsp; (the open square brackets mean&nbsp; "round up"):
 +
:$$N_{\rm bits} =  \left\lceil{\rm log_2} \hspace{0.15cm} 1/{\it \Delta} \right\rceil+1\hspace{0.05cm}. $$
  
:* Das Intervall <i>I</i> wird bestimmt durch den Beginn <i>B</i>, das Ende <i>E</i>, die Intervallbreite <i>&Delta;</i> = <i>E</i> &ndash; <i>B</i> sowie die Intervallmitte <i>M</i> = (<i>B</i> + <i>E</i>)/2.
+
For example, for&nbsp; $N_{\rm bits} = 5$&nbsp; the binary code&nbsp; <b>01001</b>&nbsp; stands for the following real-valued number&nbsp; $r$:
 +
:$$r = 0 \cdot 2^{-1}+1 \cdot 2^{-2}+0 \cdot 2^{-3}+0 \cdot 2^{-4}+1 \cdot 2^{-5} = 0.28125
 +
\hspace{0.05cm}. $$
  
:* Das Intervall <i>I</i> wird gekennzeichnet durch die Binärdarstellung (mit begrenzter Auflösung) eines beliebigen reellen Zahlenwertes <i>r</i> &#8712; <i>I</i>. Beispielsweise wählt man <i>r</i> &asymp; <i>M</i>.
 
  
:* Die erforderliche Bitanzahl ergibt sich aus der Intervallbreite nach folgender Gleichung (die nach unten offenen Klammern bedeuten &bdquo;nach oben runden&rdquo;):
 
:$$N_{\rm Bit} =  \left\lceil{\rm ld} \hspace{0.15cm} 1/{\it \Delta} \right\rceil+1\hspace{0.05cm}. $$
 
  
Beispielsweise steht für <i>N</i><sub>Bit</sub> = 5 der Binärcode <b>01001</b> für die folgende reellwertige Zahl <i>r</i>:
 
:$$r = 0 \cdot 2^{-1}+1 \cdot 2^{-2}+0 \cdot 2^{-3}+0 \cdot 2^{-4}+1 \cdot 2^{-5} = 0.28125
 
\hspace{0.05cm}. $$
 
<b>Hinweis:</b> Die Aufgabe gehört zum Themengebiet von Kapitel 2.4. Allerdings wird in <i>LNTwww</i> die arithmetische Codierung nur sehr knapp behandelt. Eine Übersicht finden Sie auch in WIKIPEDIA und eine ausführlichere Abhandlung in [BCK02].
 
  
  
===Fragebogen===
+
 
 +
 
 +
<u>Hints:</u>
 +
*The exercise belongs to the chapter&nbsp; [[Information_Theory/Further_Source_Coding_Methods|Further source coding methods]].
 +
*In particular, reference is made to the page&nbsp; [[Information_Theory/Further_Source_Coding_Methods#Arithmetic_coding|Arithmetic Coding]].
 +
*Further information on the topic can also be found in this&nbsp; [https://en.wikipedia.org/wiki/Arithmetic_coding Wikipedia article].
 +
 +
 
 +
 
 +
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
{Wie viele Bit werden zur Darstellung der Folge <b>XXY</b> benutzt?
+
{How many bits are used to represent the source symbol sequence&nbsp; $\rm XXY$ &nbsp; &rArr; &nbsp; $N = 3$&nbsp;?
 
|type="{}"}
 
|type="{}"}
$N = 3:\ N_\text{Bit}$ = { 6 3% }
+
$N_\text{bits} \ = \ $ { 6 }
  
  
{Welcher arithmetischer Code (AC) gilt für diesen Fall?
+
{Which arithmetic code&nbsp; $\rm (AC)$&nbsp; applies to this case?
|type="[]"}
+
|type="()"}
- AC = <b>01011</b>,
+
- $\rm AC = $&nbsp;  <b>01011</b>,
+ AC = <b>010111</b>,
+
+ $\rm AC = $&nbsp;  <b>010111</b>,
- AC = <b>110111</b>.
+
- $\rm AC = $&nbsp; <b>110111</b>.
  
  
{Wie viele Bit werden zur Darstellung von <b>XXYXXXZ</b> benutzt?
+
{How many bits are used to represent the source symbol sequence&nbsp; $\rm XXYXXXZ$ &nbsp; &rArr; &nbsp; $N = 7$&nbsp;?
 
|type="{}"}
 
|type="{}"}
$N = 7:\ N_\text{Bit}$ = { 11 3% }
+
$N_\text{bits} \ = \ $ { 11 }
  
  
{Ist <b>01011100001</b> ein gültiger Code für die Symbolfolge <b>XXYXXXZ</b>?
+
{Is&nbsp; <b>01011100001</b>&nbsp; a valid code for the source symbol sequence&nbsp; $\rm XXYXXXZ$?
|type="[]"}
+
|type="()"}
- Ja.
+
- Yes.
+ Nein.
+
+ No.
  
  
{Welche Aussagen gelten für die arithmetische Codierung allgemein?
+
{Which statements apply to arithmetic coding in general?
 
|type="[]"}
 
|type="[]"}
+ Es handelt sich um eine gemeinsame Codierung ganzer Folgen.
+
+ It is a common coding of entire sequences.
+ Eine 32 Bit&ndash;Rechnerarchitektur begrenzt die Folgenlänge <i>N</i>.
+
+ A computer with 32-bit architecture limits the sequence length&nbsp; $N$.
+ Dieses Problem lässt sich durch Integer&ndash;Realisierung umgehen.
+
+ This problem can be circumvented by integer realisation.
+ Eine Integer&ndash;Realisierung erhöht die Codiergeschwindigkeit.
+
+ Integer realisation increases the coding speed.
  
  
Line 64: Line 72:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
<b>1.</b>&nbsp;&nbsp;Das ausgewählte Intervall beginnt bei <i>B</i><sub>3</sub> = 0.343 und endet bei <i>E</i><sub>3</sub> = 0.392. Die Intervallbreite ist somit <i>&Delta;</i><sub>3</sub> = 0.049 und damit gilt mit dem <i>Logarithmus dualis</i> &bdquo;log<sub>2</sub>&rdquo; &nbsp;&#8594;&nbsp; &bdquo;ld&rdquo;:
+
'''(1)'''&nbsp; The selected interval begins at&nbsp; $B_3 = 0.343$&nbsp; and ends at&nbsp; $E_3 = 0.392$.  
:$$N_{\rm Bit} = {\rm ld} \hspace{0.15cm} \left\lceil \frac{1}{0.049}\right\rceil+1\hspace{0.15cm}\underline{= 6}
+
*The interval width is therefore&nbsp; ${\it \Delta}_3 = 0.049$&nbsp; and thus the following applies with the logarithm to base 2:
 +
:$$N_{\rm bits} = {\rm log_2} \hspace{0.15cm} \left\lceil \frac{1}{0.049}\right\rceil+1\hspace{0.15cm}\underline{= 6}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
<b>2.</b>&nbsp;&nbsp;Das ausgewählte Intervall ergibt sich zu <i>I</i> = [0.343, 0.392). Die Mitte liegt bei <i>M</i><sub>3</sub> = 0.3675. Zur Bestimmung des arithmetischen Codes versuchen wir, die Intervallmitte durch eine Binärdarstellung möglichst gut zu erreichen. Da uns gerade kein entsprechendes Tool zur Lösung dieser Aufgabe zur Verfügung steht, gehen wir von folgenden Nebenrechnungen aus:
 
  
: <i>H</i><sub>4</sub> = 2<sup>&ndash;2</sup> + 2<sup>&ndash;4</sup> = 0.3125 &nbsp;&#8658;&nbsp; gehört nicht zum Intervall <i>I</i>,
+
'''(2)'''&nbsp; The selected interval results in&nbsp; $I = \big[0.343, \ 0.392\big)$.
 +
*The middle lies at&nbsp; $M_3 = 0.3675$.
 +
*To determine the arithmetic code, we try to reach the centre of the interval as well as possible by using a binary representation.
 +
*Since we do not have a corresponding tool for solving this task at the moment, we assume the following secondary calculations:
  
: <i>H</i><sub>5</sub> = <i>H</i><sub>4</sub> + 2<sup>&ndash;5</sup> = 0.34375 &#8712; <i>I</i>  &nbsp;&#8658;&nbsp;  Binärdarstellung: 0.01011  &nbsp;&#8658;&nbsp;  Code: <b>01011</b>.
 
  
: <i>H</i><sub>6</sub> = <i>H</i><sub>5</sub> + 2<sup>&ndash;6</sup> = 0.359375 &#8712; <i>I</i>&nbsp;&#8658;&nbsp;  Binärdarstellung: 0.010111  &nbsp;&#8658;&nbsp;  Code: <b>010111</b>.
 
  
: <i>H</i><sub>7</sub> = <i>H</i><sub>6</sub> + 2<sup>&ndash;7</sup> = 0.3671875 &#8712; <i>I</i> &nbsp;&#8658;&nbsp; Binärdarstellung: 0.0101111  &nbsp;&#8658;&nbsp;  Code: <b>0101111</b>.
+
:&nbsp;$H_4 = 2^{-2} + 2^{-4} = 0.3125$ &nbsp; &#8658; &nbsp; does not belong to interval&nbsp; $I$.
  
: <i>H</i><sub>12</sub> = <i>H</i><sub>7</sub> + 2<sup>&ndash;12</sup> = 0.3674316406 &#8712; <i>I</i>&nbsp;&#8658;&nbsp;  binär: 0.010111100001 &nbsp;&#8658;&nbsp;  Code: <b>010111100001</b>.
+
:&nbsp;$H_5 = H_4 +2^{-5} = 0.34375  \in I&nbsp; &#8658; &nbsp;  binary representation: &nbsp; '''0.01011''' &nbsp;&#8658;&nbsp;  Code: &nbsp; <b>01011</b>.
  
Der entsprechende 6 Bit&ndash;Code lautet somit AC = <b>010111</b> &nbsp;&nbsp;&#8658;&nbsp;&nbsp; Richtig ist der <u>Vorschlag 2</u>.
+
:&nbsp;$H_6 = H_5 +2^{-6} = 0.359375  \in I$  &nbsp; &#8658; &nbspbinary representation: &nbsp; '''0.010111'''  &nbsp;&#8658;&nbsp; Code: &nbsp; <b>010111</b>.
  
<b>3.</b>&nbsp;&nbsp;Hier ergibt sich mit dem Beginn <i>B</i><sub>7</sub> = 0.3564456 und dem Ende <i>E</i><sub>7</sub> = 0.359807 die Intervallbreite <i>&Delta;</i><sub>7</sub> = 0.0033614 und damit
+
:&nbsp;$H_7 = H_6 +2^{-7} = 0.3671875  \in I$  &nbsp; &#8658; &nbsp;  binary representation: &nbsp; '''0.0101111'''  &nbsp;&#8658;&nbsp;  Code: &nbsp; <b>0101111</b>.
:$$N_{\rm Bit} = \left\lceil  {\rm ld} \hspace{0.15cm} \frac{1}{0.0033614} \right\rceil + 1\hspace{0.15cm} =
+
 
\left\lceil  {\rm ld} \hspace{0.15cm} 297.5 \right\rceil + 1\hspace{0.15cm} \underline{= 11}
+
:&nbsp;$H_{12} = H_7 +2^{-12} = 0.3674316406  \in I$  &nbsp; &#8658; &nbsp;  binary representation: &nbsp; '''0.010111100001'''  &nbsp;&#8658;&nbsp; Code: &nbsp; <b>010111100001</b>.
 +
 
 +
The corresponding 6&ndash;bit code is therefore&nbsp; $\rm AC =$&nbsp;  <b>010111</b> &nbsp;&nbsp;&#8658;&nbsp;&nbsp; Correct is&nbsp; <u>solution suggestion 2</u>.
 +
 
 +
 
 +
 
 +
'''(3)'''&nbsp; Here, with the beginning&nbsp; $B_7 = 0.3564456$&nbsp; and the end&nbsp; $E_7  = 0.359807$&nbsp; the interval width is&nbsp; ${\it \Delta}_7 = 0.0033614$&nbsp; and thus
 +
:$$N_{\rm bits} = \left\lceil  {\rm log_2} \hspace{0.15cm} \frac{1}{0.0033614} \right\rceil + 1\hspace{0.15cm} =
 +
\left\lceil  {\rm log_2} \hspace{0.15cm} 297.5 \right\rceil + 1\hspace{0.15cm} \underline{= 11}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
<b>4.</b>&nbsp;&nbsp;Die Binärdarstellung des Codes <b>01011100001</b> ergibt
+
 
 +
 
 +
'''(4)'''&nbsp; The binary representation of the code&nbsp; <b>01011100001</b>&nbsp; results in
 
:$$2^{-2}+ 2^{-4}+ 2^{-5}+ 2^{-6}+ 2^{-11} = 0.3598632813 > E_7
 
:$$2^{-2}+ 2^{-4}+ 2^{-5}+ 2^{-6}+ 2^{-11} = 0.3598632813 > E_7
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
Richtig ist also <u>NEIN</u>. Wegen
+
*The correct answer is therefore <u>NO</u>.&nbsp; The valid arithmetic code is&nbsp; $\rm AC =$&nbsp; <b>01011011101</b>, because of
:$$2^{-2}+ 2^{-4}+ 2^{-5}+ 2^{-7}+ 2^{-8}+ 2^{-9}+ 2^{-11} =0.3579101563$$
+
:$$2^{-2}+ 2^{-4}+ 2^{-5}+ 2^{-7}+ 2^{-8}+ 2^{-9}+ 2^{-11} =0.3579101563 \hspace{0.3cm}
:$$\Rightarrow \hspace{0.3cm} B_7 \le 0.3579101563 <  E_7$$
+
\Rightarrow \hspace{0.3cm} B_7 \le 0.3579101563 <  E_7.$$
ist der gültige arithmetische Code gleich <b>01011011101</b>.
+
 
  
<b>e)</b>&nbsp;&nbsp;<u>Alle Aussagen</u> sind richtig. Sie können sich davon beispielsweise in [BCK02] überzeugen.
+
'''(5)'''&nbsp; <u>All statements</u> are correct. See also &nbsp; [https://en.wikipedia.org/wiki/Arithmetic_coding '''this Wikipedia article'''].
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  
  
  
[[Category:Aufgaben zu Informationstheorie|^2.4 Weitere Quellencodierverfahren^]]
+
[[Category:Information Theory: Exercises|^2.4 Further Source Coding Methods^]]

Latest revision as of 16:26, 12 August 2021

Preset interval ranges

Here we consider arithmetic coding  $(\rm AC)$.  All necessary information about this type of entropy coding can be found in  Exercise 2.11.

The graph is also the result of Exercise 2.11.  The numerical values for coding steps 3 and 7 that are important for the current task are highlighted in colour:

  • The interval for  $N= 3$  $($symbol sequence  $\rm XXY)$  starts at  $B_3 = 0.343$  and goes up to  $E_3 = 0.392$.
  • The interval limits for  $N= 7$  $($symbol sequence  $\rm XXYXXXZ)$  are  $B_7 = 0.3564456$  and  $E_7 =0.359807$.


This task is only about assigning binary sequences to the selected intervals. Procedure:

  • The interval  $I$  is determined by the beginning  $B$, the end  $E$,  the interval width  ${\it \Delta} = E-B$  as well as the interval centre  $M = (B+E)/2$.
  • The interval  $I$  is characterised by the binary representation  (with limited resolution)  of any real number value  $r \in I$.  For example, one chooses  $r \approx M$.
  • The required number of bits results from the interval width according to the following equation  (the open square brackets mean  "round up"):
$$N_{\rm bits} = \left\lceil{\rm log_2} \hspace{0.15cm} 1/{\it \Delta} \right\rceil+1\hspace{0.05cm}. $$

For example, for  $N_{\rm bits} = 5$  the binary code  01001  stands for the following real-valued number  $r$:

$$r = 0 \cdot 2^{-1}+1 \cdot 2^{-2}+0 \cdot 2^{-3}+0 \cdot 2^{-4}+1 \cdot 2^{-5} = 0.28125 \hspace{0.05cm}. $$




Hints:


Questions

1

How many bits are used to represent the source symbol sequence  $\rm XXY$   ⇒   $N = 3$ ?

$N_\text{bits} \ = \ $

2

Which arithmetic code  $\rm (AC)$  applies to this case?

$\rm AC = $  01011,
$\rm AC = $  010111,
$\rm AC = $  110111.

3

How many bits are used to represent the source symbol sequence  $\rm XXYXXXZ$   ⇒   $N = 7$ ?

$N_\text{bits} \ = \ $

4

Is  01011100001  a valid code for the source symbol sequence  $\rm XXYXXXZ$?

Yes.
No.

5

Which statements apply to arithmetic coding in general?

It is a common coding of entire sequences.
A computer with 32-bit architecture limits the sequence length  $N$.
This problem can be circumvented by integer realisation.
Integer realisation increases the coding speed.


Solution

(1)  The selected interval begins at  $B_3 = 0.343$  and ends at  $E_3 = 0.392$.

  • The interval width is therefore  ${\it \Delta}_3 = 0.049$  and thus the following applies with the logarithm to base 2:
$$N_{\rm bits} = {\rm log_2} \hspace{0.15cm} \left\lceil \frac{1}{0.049}\right\rceil+1\hspace{0.15cm}\underline{= 6} \hspace{0.05cm}.$$


(2)  The selected interval results in  $I = \big[0.343, \ 0.392\big)$.

  • The middle lies at  $M_3 = 0.3675$.
  • To determine the arithmetic code, we try to reach the centre of the interval as well as possible by using a binary representation.
  • Since we do not have a corresponding tool for solving this task at the moment, we assume the following secondary calculations:


 $H_4 = 2^{-2} + 2^{-4} = 0.3125$   ⇒   does not belong to interval  $I$.
 $H_5 = H_4 +2^{-5} = 0.34375 \in I$   ⇒   binary representation:   0.01011  ⇒  Code:   01011.
 $H_6 = H_5 +2^{-6} = 0.359375 \in I$   ⇒   binary representation:   0.010111  ⇒  Code:   010111.
 $H_7 = H_6 +2^{-7} = 0.3671875 \in I$   ⇒   binary representation:   0.0101111  ⇒  Code:   0101111.
 $H_{12} = H_7 +2^{-12} = 0.3674316406 \in I$   ⇒   binary representation:   0.010111100001  ⇒  Code:   010111100001.

The corresponding 6–bit code is therefore  $\rm AC =$  010111   ⇒   Correct is  solution suggestion 2.


(3)  Here, with the beginning  $B_7 = 0.3564456$  and the end  $E_7 = 0.359807$  the interval width is  ${\it \Delta}_7 = 0.0033614$  and thus

$$N_{\rm bits} = \left\lceil {\rm log_2} \hspace{0.15cm} \frac{1}{0.0033614} \right\rceil + 1\hspace{0.15cm} = \left\lceil {\rm log_2} \hspace{0.15cm} 297.5 \right\rceil + 1\hspace{0.15cm} \underline{= 11} \hspace{0.05cm}.$$


(4)  The binary representation of the code  01011100001  results in

$$2^{-2}+ 2^{-4}+ 2^{-5}+ 2^{-6}+ 2^{-11} = 0.3598632813 > E_7 \hspace{0.05cm}.$$
  • The correct answer is therefore NO.  The valid arithmetic code is  $\rm AC =$  01011011101, because of
$$2^{-2}+ 2^{-4}+ 2^{-5}+ 2^{-7}+ 2^{-8}+ 2^{-9}+ 2^{-11} =0.3579101563 \hspace{0.3cm} \Rightarrow \hspace{0.3cm} B_7 \le 0.3579101563 < E_7.$$


(5)  All statements are correct. See also   this Wikipedia article.