Difference between revisions of "Aufgaben:Exercise 2.11Z: Arithmetic Coding once again"

From LNTwww
m (Text replacement - "„" to """)
 
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Weitere Quellencodierverfahren
+
{{quiz-Header|Buchseite=Information_Theory/Further_Source_Coding_Methods
 
}}
 
}}
  
[[File:P_ID2473__Inf_A_2_12.png|right|frame|Vorgegebene Intervallbereiche]]
+
[[File:EN_Inf_Z_2_11_v2.png|right|frame|Preset interval ranges]]
Wir betrachten hier die arithmetische Codierung  $(\rm AC)$.  Alle notwendigen Informationen zu dieser Art von Entropiecodierung finden Sie in der  [[Aufgaben:2.11_Arithmetische_Codierung|Aufgabe 2.11]].
+
Here we consider arithmetic coding  $(\rm AC)$.  All necessary information about this type of entropy coding can be found in  [[Aufgaben:Exercise_2.11:_Arithmetic_Coding|Exercise 2.11]].
  
Auch die Grafik ist das Ergebnis von Aufgabe 2.11.  Die für die aktuelle Aufgabe wichtigen Zahlenwerte für die Codierschritte 3 und 7 sind farblich hervorgehoben:
+
The graph is also the result of Exercise 2.11.  The numerical values for coding steps 3 and 7 that are important for the current task are highlighted in colour:
* Das Intervall für  $N= 3$  $($Symbolfolge  $\rm XXY)$  beginnt bei  $B_3 = 0.343$  und reicht bis  $E_3 = 0.392$.
+
* The interval for  $N= 3$  $($symbol sequence  $\rm XXY)$  starts at  $B_3 = 0.343$  and goes up to  $E_3 = 0.392$.
* Das Intervallgrenzen für  $N= 7$  $($Symbolfolge  $\rm XXYXXXZ)$  sind  $B_7 = 0.3564456$  und  $E_7 =0.359807$.
+
* The interval limits for  $N= 7$  $($symbol sequence  $\rm XXYXXXZ)$  are  $B_7 = 0.3564456$  and  $E_7 =0.359807$.
  
  
In dieser Aufgabe geht es nur um die Zuweisung von Binärfolgen zu den ausgewählten Intervallen. Vorgehensweise:
+
This task is only about assigning binary sequences to the selected intervals. Procedure:
  
* Das Intervall  $I$  wird bestimmt durch den Beginn  $B$, das Ende  $E$,  die Intervallbreite  ${\it \Delta} = E-B$  sowie die Intervallmitte  $M = (B+E)/2$.
+
* The interval  $I$  is determined by the beginning  $B$, the end  $E$,  the interval width  ${\it \Delta} = E-B$  as well as the interval centre  $M = (B+E)/2$.
* Das Intervall  $I$  wird gekennzeichnet durch die Binärdarstellung (mit begrenzter Auflösung) eines beliebigen reellen Zahlenwertes  $r \in I$.  Beispielsweise wählt man  $r \approx M$.
+
* The interval  $I$  is characterised by the binary representation  (with limited resolution)  of any real number value  $r \in I$.  For example, one chooses  $r \approx M$.
* Die erforderliche Bitanzahl ergibt sich aus der Intervallbreite nach folgender Gleichung (die nach unten offenen eckigen Klammern bedeuten "nach oben runden"):
+
* The required number of bits results from the interval width according to the following equation  (the open square brackets mean  "round up"):
:$$N_{\rm Bit} =  \left\lceil{\rm log_2} \hspace{0.15cm} 1/{\it \Delta} \right\rceil+1\hspace{0.05cm}. $$
+
:$$N_{\rm bits} =  \left\lceil{\rm log_2} \hspace{0.15cm} 1/{\it \Delta} \right\rceil+1\hspace{0.05cm}. $$
  
Beispielsweise steht für&nbsp; $N_{\rm Bit} = 5$&nbsp; der Binärcode&nbsp; <b>01001</b>&nbsp; für die folgende reellwertige Zahl&nbsp; $r$:
+
For example, for&nbsp; $N_{\rm bits} = 5$&nbsp; the binary code&nbsp; <b>01001</b>&nbsp; stands for the following real-valued number&nbsp; $r$:
 
:$$r = 0 \cdot 2^{-1}+1 \cdot 2^{-2}+0 \cdot 2^{-3}+0 \cdot 2^{-4}+1 \cdot 2^{-5} = 0.28125
 
:$$r = 0 \cdot 2^{-1}+1 \cdot 2^{-2}+0 \cdot 2^{-3}+0 \cdot 2^{-4}+1 \cdot 2^{-5} = 0.28125
 
\hspace{0.05cm}. $$
 
\hspace{0.05cm}. $$
Line 28: Line 28:
  
  
''Hinweise:''
+
<u>Hints:</u>
*Die Aufgabe gehört zum  Kapitel&nbsp; [[Information_Theory/Weitere_Quellencodierverfahren|Weitere Quellencodierverfahren]].
+
*The exercise belongs to the chapter&nbsp; [[Information_Theory/Further_Source_Coding_Methods|Further source coding methods]].
*Insbesondere wird  Bezug genommen auf die Seite&nbsp; [[Information_Theory/Weitere_Quellencodierverfahren#Arithmetische_Codierung|Arithmetische Codierung]].
+
*In particular, reference is made to the page&nbsp; [[Information_Theory/Further_Source_Coding_Methods#Arithmetic_coding|Arithmetic Coding]].
*Weitere Informationen zum Thema finden Sie auch in diesem&nbsp; [https://de.wikipedia.org/wiki/Arithmetisches_Kodieren WIKIPEDIA-Artikel].
+
*Further information on the topic can also be found in this&nbsp; [https://en.wikipedia.org/wiki/Arithmetic_coding Wikipedia article].
 
   
 
   
  
  
===Fragebogen===
+
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
{Wie viele Bit werden zur Darstellung der Quellensymbolfolge&nbsp; $\rm XXY$ &nbsp; &rArr; &nbsp; $N = 3$&nbsp; benutzt?
+
{How many bits are used to represent the source symbol sequence&nbsp; $\rm XXY$ &nbsp; &rArr; &nbsp; $N = 3$&nbsp;?
 
|type="{}"}
 
|type="{}"}
$N_\text{Bit} \ = \ $ { 6 }
+
$N_\text{bits} \ = \ $ { 6 }
  
  
{Welcher arithmetischer Code&nbsp; $\rm (AC)$&nbsp; gilt für diesen Fall?
+
{Which arithmetic code&nbsp; $\rm (AC)$&nbsp; applies to this case?
 
|type="()"}
 
|type="()"}
 
- $\rm AC = $&nbsp;  <b>01011</b>,
 
- $\rm AC = $&nbsp;  <b>01011</b>,
Line 50: Line 50:
  
  
{Wie viele Bit werden zur Darstellung der Quellensymbolfolge&nbsp; $\rm XXYXXXZ$ &nbsp; &rArr; &nbsp; $N = 7$&nbsp; benutzt?
+
{How many bits are used to represent the source symbol sequence&nbsp; $\rm XXYXXXZ$ &nbsp; &rArr; &nbsp; $N = 7$&nbsp;?
 
|type="{}"}
 
|type="{}"}
$N_\text{Bit} \ = \ $ { 11 }
+
$N_\text{bits} \ = \ $ { 11 }
  
  
{Ist&nbsp; <b>01011100001</b>&nbsp; ein gültiger Code für die Quellensymbolfolge&nbsp;  $\rm XXYXXXZ$?
+
{Is&nbsp; <b>01011100001</b>&nbsp; a valid code for the source symbol sequence&nbsp;  $\rm XXYXXXZ$?
 
|type="()"}
 
|type="()"}
- Ja.
+
- Yes.
+ Nein.
+
+ No.
  
  
{Welche Aussagen gelten für die arithmetische Codierung allgemein?
+
{Which statements apply to arithmetic coding in general?
 
|type="[]"}
 
|type="[]"}
+ Es handelt sich um eine gemeinsame Codierung ganzer Folgen.
+
+ It is a common coding of entire sequences.
+ Eine 32 Bit&ndash;Rechnerarchitektur begrenzt die Folgenlänge&nbsp; $N$.
+
+ A computer with 32-bit architecture limits the sequence length&nbsp; $N$.
+ Dieses Problem lässt sich durch Integer&ndash;Realisierung umgehen.
+
+ This problem can be circumvented by integer realisation.
+ Eine Integer&ndash;Realisierung erhöht die Codiergeschwindigkeit.
+
+ Integer realisation increases the coding speed.
  
  
Line 72: Line 72:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
'''(1)'''&nbsp; Das ausgewählte Intervall beginnt bei&nbsp; $B_3 = 0.343$&nbsp; und endet bei&nbsp; $E_3 = 0.392$.  
+
'''(1)'''&nbsp; The selected interval begins at&nbsp; $B_3 = 0.343$&nbsp; and ends at&nbsp; $E_3 = 0.392$.  
*Die Intervallbreite ist somit&nbsp; ${\it \Delta}_3 = 0.049$&nbsp; und damit gilt mit dem <i>Logarithmus dualis</i>:
+
*The interval width is therefore&nbsp; ${\it \Delta}_3 = 0.049$&nbsp; and thus the following applies with the logarithm to base 2:
:$$N_{\rm Bit} = {\rm log_2} \hspace{0.15cm} \left\lceil \frac{1}{0.049}\right\rceil+1\hspace{0.15cm}\underline{= 6}
+
:$$N_{\rm bits} = {\rm log_2} \hspace{0.15cm} \left\lceil \frac{1}{0.049}\right\rceil+1\hspace{0.15cm}\underline{= 6}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
  
'''(2)'''&nbsp; Das ausgewählte Intervall ergibt sich zu&nbsp; $I = \big[0.343, \ 0.392\big)$.  
+
'''(2)'''&nbsp; The selected interval results in&nbsp; $I = \big[0.343, \ 0.392\big)$.  
*Die Mitte liegt bei&nbsp; $M_3 = 0.3675$.  
+
*The middle lies at&nbsp; $M_3 = 0.3675$.  
*Zur Bestimmung des arithmetischen Codes versuchen wir, die Intervallmitte durch eine Binärdarstellung möglichst gut zu erreichen.
+
*To determine the arithmetic code, we try to reach the centre of the interval as well as possible by using a binary representation.
*Da uns gerade kein entsprechendes Tool zur Lösung dieser Aufgabe zur Verfügung steht, gehen wir von folgenden Nebenrechnungen aus:
+
*Since we do not have a corresponding tool for solving this task at the moment, we assume the following secondary calculations:
  
  
:&nbsp;$H_4 = 2^{-2} + 2^{-2} = 0.3125$ &nbsp; &#8658; &nbsp; gehört nicht zum Intervall&nbsp; $I$.
 
  
:&nbsp;$H_5 = H_4 +2^{-5} = 0.34375  \in I$ &nbsp; &#8658; &nbsp; Binärdarstellung: &nbsp; '''0.01011'''  &nbsp;&#8658;&nbsp;  Code: &nbsp; <b>01011</b>.
+
:&nbsp;$H_4 = 2^{-2} + 2^{-4} = 0.3125$ &nbsp; &#8658; &nbsp; does not belong to interval&nbsp; $I$.
  
:&nbsp;$H_6 = H_5 +2^{-6} = 0.359375 \in I$  &nbsp; &#8658; &nbsp;  Binärdarstellung: &nbsp; '''0.010111'''  &nbsp;&#8658;&nbsp;  Code: &nbsp; <b>010111</b>.
+
:&nbsp;$H_5 = H_4 +2^{-5} = 0.34375 \in I$  &nbsp; &#8658; &nbsp;  binary representation: &nbsp; '''0.01011'''  &nbsp;&#8658;&nbsp;  Code: &nbsp; <b>01011</b>.
  
:&nbsp;$H_7 = H_6 +2^{-7} = 0.3671875 \in I$  &nbsp; &#8658; &nbsp;  Binärdarstellung: &nbsp; '''0.0101111'''  &nbsp;&#8658;&nbsp;  Code: &nbsp; <b>0101111</b>.
+
:&nbsp;$H_6 = H_5 +2^{-6} = 0.359375 \in I$  &nbsp; &#8658; &nbsp;  binary representation: &nbsp; '''0.010111'''  &nbsp;&#8658;&nbsp;  Code: &nbsp; <b>010111</b>.
  
:&nbsp;$H_{12} = H_7 +2^{-12} = 0.3674316406 \in I$  &nbsp; &#8658; &nbsp;  Binärdarstellung: &nbsp; '''0.010111100001'''  &nbsp;&#8658;&nbsp;  Code: &nbsp; <b>010111100001</b>.
+
:&nbsp;$H_7 = H_6 +2^{-7} = 0.3671875 \in I$  &nbsp; &#8658; &nbsp;  binary representation: &nbsp; '''0.0101111'''  &nbsp;&#8658;&nbsp;  Code: &nbsp; <b>0101111</b>.
  
Der entsprechende 6 Bit&ndash;Code lautet somit&nbsp; $\rm AC =$&nbsp;  <b>010111</b> &nbsp;&nbsp;&#8658;&nbsp;&nbsp; Richtig ist der <u>Lösungsvorschlag 2</u>.
+
:&nbsp;$H_{12} = H_7 +2^{-12} = 0.3674316406  \in I$ &nbsp; &#8658; &nbsp;  binary representation: &nbsp; '''0.010111100001'''  &nbsp;&#8658;&nbsp; Code: &nbsp; <b>010111100001</b>.
  
 +
The corresponding 6&ndash;bit code is therefore&nbsp; $\rm AC =$&nbsp;  <b>010111</b> &nbsp;&nbsp;&#8658;&nbsp;&nbsp; Correct is&nbsp; <u>solution suggestion 2</u>.
  
  
'''(3)'''&nbsp; Hier ergibt sich mit dem Beginn&nbsp; $B_7 = 0.3564456$&nbsp; und dem Ende&nbsp; $E_7  = 0.359807$&nbsp; die Intervallbreite&nbsp; ${\it \Delta}_7 = 0.0033614$&nbsp; und damit
+
 
:$$N_{\rm Bit} = \left\lceil  {\rm log_2} \hspace{0.15cm} \frac{1}{0.0033614} \right\rceil + 1\hspace{0.15cm} =
+
'''(3)'''&nbsp; Here, with the beginning&nbsp; $B_7 = 0.3564456$&nbsp; and the end&nbsp; $E_7  = 0.359807$&nbsp; the interval width is&nbsp; ${\it \Delta}_7 = 0.0033614$&nbsp; and thus
 +
:$$N_{\rm bits} = \left\lceil  {\rm log_2} \hspace{0.15cm} \frac{1}{0.0033614} \right\rceil + 1\hspace{0.15cm} =
 
\left\lceil  {\rm log_2} \hspace{0.15cm} 297.5 \right\rceil + 1\hspace{0.15cm} \underline{= 11}
 
\left\lceil  {\rm log_2} \hspace{0.15cm} 297.5 \right\rceil + 1\hspace{0.15cm} \underline{= 11}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
Line 107: Line 108:
  
  
'''(4)'''&nbsp; Die Binärdarstellung des Codes&nbsp; <b>01011100001</b>&nbsp; ergibt
+
'''(4)'''&nbsp; The binary representation of the code&nbsp; <b>01011100001</b>&nbsp; results in
 
:$$2^{-2}+ 2^{-4}+ 2^{-5}+ 2^{-6}+ 2^{-11} = 0.3598632813 > E_7
 
:$$2^{-2}+ 2^{-4}+ 2^{-5}+ 2^{-6}+ 2^{-11} = 0.3598632813 > E_7
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
*Richtig ist also <u>NEIN</u>.&nbsp; Der gültige arithmetische Code ist&nbsp; $\rm AC =$&nbsp; <b>01011011101</b>, wegen
+
*The correct answer is therefore <u>NO</u>.&nbsp; The valid arithmetic code is&nbsp; $\rm AC =$&nbsp; <b>01011011101</b>, because of
 
:$$2^{-2}+ 2^{-4}+ 2^{-5}+ 2^{-7}+ 2^{-8}+ 2^{-9}+ 2^{-11} =0.3579101563 \hspace{0.3cm}
 
:$$2^{-2}+ 2^{-4}+ 2^{-5}+ 2^{-7}+ 2^{-8}+ 2^{-9}+ 2^{-11} =0.3579101563 \hspace{0.3cm}
 
\Rightarrow \hspace{0.3cm} B_7 \le 0.3579101563 <  E_7.$$
 
\Rightarrow \hspace{0.3cm} B_7 \le 0.3579101563 <  E_7.$$
  
  
'''(5)'''&nbsp; <u>Alle Aussagen</u> sind richtig. Siehe auch:
+
'''(5)'''&nbsp; <u>All statements</u> are correct. See also &nbsp; [https://en.wikipedia.org/wiki/Arithmetic_coding '''this Wikipedia article'''].
:Bodden, E.; Clasen, M.; Kneis, J.: Algebraische Kodierung. Proseminar, Lehrstuhl für Informatik IV, RWTH Aachen, 2002.
 
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  
  
  
[[Category:Information Theory: Exercises|^2.4 Weitere Quellencodierverfahren^]]
+
[[Category:Information Theory: Exercises|^2.4 Further Source Coding Methods^]]

Latest revision as of 17:26, 12 August 2021

Preset interval ranges

Here we consider arithmetic coding  $(\rm AC)$.  All necessary information about this type of entropy coding can be found in  Exercise 2.11.

The graph is also the result of Exercise 2.11.  The numerical values for coding steps 3 and 7 that are important for the current task are highlighted in colour:

  • The interval for  $N= 3$  $($symbol sequence  $\rm XXY)$  starts at  $B_3 = 0.343$  and goes up to  $E_3 = 0.392$.
  • The interval limits for  $N= 7$  $($symbol sequence  $\rm XXYXXXZ)$  are  $B_7 = 0.3564456$  and  $E_7 =0.359807$.


This task is only about assigning binary sequences to the selected intervals. Procedure:

  • The interval  $I$  is determined by the beginning  $B$, the end  $E$,  the interval width  ${\it \Delta} = E-B$  as well as the interval centre  $M = (B+E)/2$.
  • The interval  $I$  is characterised by the binary representation  (with limited resolution)  of any real number value  $r \in I$.  For example, one chooses  $r \approx M$.
  • The required number of bits results from the interval width according to the following equation  (the open square brackets mean  "round up"):
$$N_{\rm bits} = \left\lceil{\rm log_2} \hspace{0.15cm} 1/{\it \Delta} \right\rceil+1\hspace{0.05cm}. $$

For example, for  $N_{\rm bits} = 5$  the binary code  01001  stands for the following real-valued number  $r$:

$$r = 0 \cdot 2^{-1}+1 \cdot 2^{-2}+0 \cdot 2^{-3}+0 \cdot 2^{-4}+1 \cdot 2^{-5} = 0.28125 \hspace{0.05cm}. $$




Hints:


Questions

1

How many bits are used to represent the source symbol sequence  $\rm XXY$   ⇒   $N = 3$ ?

$N_\text{bits} \ = \ $

2

Which arithmetic code  $\rm (AC)$  applies to this case?

$\rm AC = $  01011,
$\rm AC = $  010111,
$\rm AC = $  110111.

3

How many bits are used to represent the source symbol sequence  $\rm XXYXXXZ$   ⇒   $N = 7$ ?

$N_\text{bits} \ = \ $

4

Is  01011100001  a valid code for the source symbol sequence  $\rm XXYXXXZ$?

Yes.
No.

5

Which statements apply to arithmetic coding in general?

It is a common coding of entire sequences.
A computer with 32-bit architecture limits the sequence length  $N$.
This problem can be circumvented by integer realisation.
Integer realisation increases the coding speed.


Solution

(1)  The selected interval begins at  $B_3 = 0.343$  and ends at  $E_3 = 0.392$.

  • The interval width is therefore  ${\it \Delta}_3 = 0.049$  and thus the following applies with the logarithm to base 2:
$$N_{\rm bits} = {\rm log_2} \hspace{0.15cm} \left\lceil \frac{1}{0.049}\right\rceil+1\hspace{0.15cm}\underline{= 6} \hspace{0.05cm}.$$


(2)  The selected interval results in  $I = \big[0.343, \ 0.392\big)$.

  • The middle lies at  $M_3 = 0.3675$.
  • To determine the arithmetic code, we try to reach the centre of the interval as well as possible by using a binary representation.
  • Since we do not have a corresponding tool for solving this task at the moment, we assume the following secondary calculations:


 $H_4 = 2^{-2} + 2^{-4} = 0.3125$   ⇒   does not belong to interval  $I$.
 $H_5 = H_4 +2^{-5} = 0.34375 \in I$   ⇒   binary representation:   0.01011  ⇒  Code:   01011.
 $H_6 = H_5 +2^{-6} = 0.359375 \in I$   ⇒   binary representation:   0.010111  ⇒  Code:   010111.
 $H_7 = H_6 +2^{-7} = 0.3671875 \in I$   ⇒   binary representation:   0.0101111  ⇒  Code:   0101111.
 $H_{12} = H_7 +2^{-12} = 0.3674316406 \in I$   ⇒   binary representation:   0.010111100001  ⇒  Code:   010111100001.

The corresponding 6–bit code is therefore  $\rm AC =$  010111   ⇒   Correct is  solution suggestion 2.


(3)  Here, with the beginning  $B_7 = 0.3564456$  and the end  $E_7 = 0.359807$  the interval width is  ${\it \Delta}_7 = 0.0033614$  and thus

$$N_{\rm bits} = \left\lceil {\rm log_2} \hspace{0.15cm} \frac{1}{0.0033614} \right\rceil + 1\hspace{0.15cm} = \left\lceil {\rm log_2} \hspace{0.15cm} 297.5 \right\rceil + 1\hspace{0.15cm} \underline{= 11} \hspace{0.05cm}.$$


(4)  The binary representation of the code  01011100001  results in

$$2^{-2}+ 2^{-4}+ 2^{-5}+ 2^{-6}+ 2^{-11} = 0.3598632813 > E_7 \hspace{0.05cm}.$$
  • The correct answer is therefore NO.  The valid arithmetic code is  $\rm AC =$  01011011101, because of
$$2^{-2}+ 2^{-4}+ 2^{-5}+ 2^{-7}+ 2^{-8}+ 2^{-9}+ 2^{-11} =0.3579101563 \hspace{0.3cm} \Rightarrow \hspace{0.3cm} B_7 \le 0.3579101563 < E_7.$$


(5)  All statements are correct. See also   this Wikipedia article.