Difference between revisions of "Aufgaben:Exercise 2.5: Residual Redundancy with LZW Coding"

From LNTwww
 
(30 intermediate revisions by 4 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Komprimierung nach Lempel, Ziv und Welch
+
{{quiz-Header|Buchseite=Information_Theory/Compression_According_to_Lempel,_Ziv_and_Welch
 
}}
 
}}
  
[[File:P_ID2446__Inf_A_2_5_neu.png|right|Restredundanz <i>r</i>(<i>N</i>) und Näherung <i>r</i> ′(<i>N</i>) dreier Quellen]]
+
[[File:P_ID2446__Inf_A_2_5_neu.png|right|frame|Residual redundancy&nbsp; $r(N)$&nbsp; and approximation &nbsp; $r\hspace{0.05cm}'(N)$&nbsp; of three sources]]
Wir gehen hier von einer binären Eingangsfolge der Länge <i>N</i> aus und  betrachten drei verschiedene binäre Nachrichtenquellen:
+
We assume here a binary input sequence of length&nbsp; $N$&nbsp; and consider three different binary sources:
* '''BQ1''': &nbsp; Symbolwahrscheinlichkeiten   <i>p</i><sub>A</sub> = 0.89, <i>p</i><sub>B</sub>&nbsp;=&nbsp;0.11, also unterschiedlich<br> &#8658;&nbsp; Entropie <i>H</i> = 0.5 bit/Quellensymbol &nbsp; &#8658; &nbsp; Quelle ist redundant.
+
* $\rm BQ1$: &nbsp; Symbol probabilities&nbsp;   $p_{\rm A} = 0.89$&nbsp; and&nbsp; $p_{\rm B} = 0.11$, i.e. different<br> &nbsp;  &#8658; &nbsp; entropy &nbsp; $H = 0.5\text{ bit/source&nbsp;symbol}$ &nbsp; &#8658; &nbsp; the source is redundant.
* '''BQ2''':  &nbsp; <i>p</i><sub>A</sub> = <i>p</i><sub>B</sub> = 0.5 (gleichwahrscheinlich) &nbsp; &#8658; &nbsp; Entropie <i>H</i> = 1 bit/Quellensymbol <br>&#8658;&nbsp; Quelle ist redundanzfrei.
+
* $\rm BQ2$:  &nbsp; $p_{\rm A} = p_{\rm B} = 0.5$&nbsp; (equally probable)<br> &nbsp; &#8658; &nbsp; entropy &nbsp; $H = 1\text{ bit/source symbol}$ &nbsp; &#8658; &nbsp; the source is redundancy-free.
* '''BQ3''': &nbsp;  Hier gibt es keine konkreten Angaben zur Statistik. In der Teilaufgabe (6) sollen Sie die Entropie <i>H</i> dieser Quelle  abschätzen.
+
* $\rm BQ3$: &nbsp;  There is no concrete information on the statistics here.&nbsp; <br>In subtask&nbsp; '''(6)'''&nbsp; you are to estimate the entropy&nbsp; $H$&nbsp; of this source.
  
Für diese drei Quellen wurden per Simulation die jeweilige <i>Restredundanz</i> <i>r</i>(<i>N</i>) ermittelt, die nach der [[Informationstheorie/Komprimierung_nach_Lempel,_Ziv_und_Welch#Der_Lempel.E2.80.93Ziv.E2.80.93Welch.E2.80.93Algorithmus|Lempel&ndash;Ziv&ndash;Welch&ndash;Codierung]] in der Binärfolge verbleibt. Die Ergebnisse sind in der jeweils ersten Spalte obiger Tabelle für die Quellen
 
'''BQ1''' (gelbe Hinterlegung), '''BQ2''' (grüne Hinterlegung) und '''BQ3''' (blaue Hinterlegung) eingetragen, wobei wir uns bei der Simulation auf Folgenlängen <i>N</i> &#8804; 50000 beschränkt haben.
 
  
Die <i>relative Redundanz der Ausgangsfolge</i> &ndash; vereinfachend <i>Restredundanz</i> genannt &ndash; kann aus
+
For these three sources, the respective&nbsp; "residual redundancy"&nbsp; $r(N)$&nbsp; was determined by simulation, which remains in the binary sequence after&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#The_Lempel-Ziv-Welch_algorithm |Lempel&ndash;Ziv&ndash;Welch coding]].&nbsp;
  
:* der Länge <i>N</i> der Eingangsfolge,
+
The results are shown in the first column of the above table for the sources
 +
*$\rm BQ1$&nbsp; (yellow background),  
 +
*$\rm BQ2$&nbsp; (green background) and
 +
*$\rm BQ3$&nbsp; (blue background)
  
:* der Länge <i>L</i>(<i>N</i>) der Ausgangsfolge und
 
  
:* der Quellenentropie <i>H</i>
+
whereby we have restricted ourselves to sequence lengths&nbsp; $N &#8804; 50000$&nbsp; in the simulation.
  
:in folgender Weise berechnet werden:
+
The&nbsp; "relative redundancy of the output sequence" &ndash; simplified called&nbsp; "residual redundancy" &ndash;  can be calculated from
 +
* the length&nbsp;  $N$&nbsp;  of the input sequence,
 +
* the length&nbsp;  $L(N)$&nbsp;  of the output sequence and
 +
*the entropy&nbsp;  $H$
 +
 
 +
 
 +
in the following way:
 
:$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 -  \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$
 
:$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 -  \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$
:Hierbei ist berücksichtigt, dass bei perfekter Quellencodierung die Länge der Ausgangsfolge bis auf den Wert  <i>L</i><sub>min</sub> = <i>N</i> &middot; <i>H</i> herabgesenkt werden könnte. Bei nichtperfekter Quellencodierung  gibt <i>L</i>(<i>N</i>) &ndash; <i>N</i> &middot; <i>H</i> die verbleibende Redundanz (mit der Pseudo&ndash;Einheit &bdquo;bit&rdquo;) an. Nach Division durch <i>L</i>(<i>N</i>) erhält man die relative Redundanz <i>r</i>(<i>N</i>) mit dem Wertebereich zwischen 0 und 1; <i>r</i>(<i>N</i>) sollte möglichst klein sein.
 
  
:Eine zweite Kenngröße zur Effizienzmessung der LZW&ndash;Codierung ist der <i>Komprimierungsfaktor</i>
+
This takes into account that with perfect source coding the length of the output sequence could be lowered to the value&nbsp;  $L_{\rm min} = N &middot; H$&nbsp;.
 +
*With non-perfect source coding,&nbsp;  $L(n) - N &middot; H$&nbsp;  gives the remaining redundancy&nbsp; (with the pseudo&ndash;unit&nbsp; "bit").
 +
*After dividing by&nbsp;  $L(n)$,&nbsp;  one obtains the relative redundancy&nbsp;  $r(n)$&nbsp;  with the value range between zero and one;&nbsp; $r(n)$&nbsp; should be as small as possible.
 +
 
 +
 
 +
A second parameter for measuring the efficiency of LZW coding is the&nbsp;  "compression factor"&nbsp;  $K(N)$&nbsp; as the quotient of the lengths of the output and input sequences, which should also be very small:
 +
 
 
:$$K(N) = {L(N) }/{N} \hspace{0.05cm},$$
 
:$$K(N) = {L(N) }/{N} \hspace{0.05cm},$$
:der ebenfalls klein sein sollte. <i>Hinweis</i>: <i>K</i>(<i>N</i>) und <i>r</i>(<i>N</i>) hängen deterministisch zusammen.
 
  
:Im Theorieteil wurde gezeigt, dass die Restredundanz <i>r</i>(<i>N</i>) durch die Funktion
+
In the&nbsp;  [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Quantitative_statements_on_asymptotic_optimality|theory section]]&nbsp;  it was shown that the residual redundancy&nbsp;  $r(n)$&nbsp;  is well approximated by the function
:$$r'(N) ={A}/{{\rm lg}\hspace{0.1cm}(N)}
+
:$$r\hspace{0.05cm}'(N) =\frac {A}{{\rm lg}\hspace{0.1cm}(N)}
 
\hspace{0.5cm}{\rm mit}\hspace{0.5cm} A = 4 \cdot {r(N = 10000)}  
 
\hspace{0.5cm}{\rm mit}\hspace{0.5cm} A = 4 \cdot {r(N = 10000)}  
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
:oft gut angenähert wird. Die Näherung <i>r</i>&prime;(<i>N</i>) ist für BQ1 in der zweiten Spalte obiger Tabelle angegeben. In den Teilaufgaben (d) und (e) sollen Sie die Approximation für die Quellen BQ2 und BQ3 vornehmen.
 
  
  
''Hinweise:''
+
*This approximation&nbsp; $r\hspace{0.05cm}'(N)$&nbsp;  is given for&nbsp; $\rm BQ1$&nbsp; in the second column of the table above.
*Die Aufgabe gehört zum Kapitel [[Informationstheorie/Komprimierung_nach_Lempel,_Ziv_und_Welch|Komprimierung nach Lempel, Ziv und Welch]].
+
* In subtasks&nbsp;  '''(4)'''&nbsp;  and&nbsp;  '''(5)'''&nbsp;  you are to make the approximation for sources&nbsp; $\rm BQ2$&nbsp; and&nbsp; $\rm BQ3$&nbsp;.
*Insbesondere wird Bezug genommen auf die Seiten [[Informationstheorie/Komprimierung_nach_Lempel,_Ziv_und_Welch#Der_Lempel.E2.80.93Ziv.E2.80.93Welch.E2.80.93Algorithmus|LZ77 &ndash; die Grundform der Lempel-Ziv-Algorithmen]], [[Informationstheorie/Komprimierung_nach_Lempel,_Ziv_und_Welch#Lempel.E2.80.93Ziv.E2.80.93Codierung_mit_variabler_Indexbitl.C3.A4nge|Lempel-Ziv-Codierung mit variabler Indexbitlänge]] sowie [[Informationstheorie/Komprimierung_nach_Lempel,_Ziv_und_Welch#Decodierung_des_LZW.E2.80.93Algorithmus|Decodierung des LZW-Agorithmus]].
+
 
*Sollte die Eingabe des Zahlenwertes &bdquo;0&rdquo; erforderlich sein, so geben Sie bitte &bdquo;0.&rdquo; ein.
+
 
*Die [[Aufgaben:2.3_Zur_LZ78-Komprimierung|Aufgabe 2.3]] sowie die [[Aufgaben:2.3Z_Zur_LZ77-Codierung|Zusatzaufgabe 2.3Z]] behandeln andere LZ-Verfahren in ähnlicher Weise.
+
 
 +
 
 +
 
 +
 
  
  
:<b>Hinweis:</b> Die Aufgabe bezieht sich auf Seite 7 und Seite 8 von Kapitel 2.2.
+
Hints:
 +
*The task belongs to the chapter&nbsp;  [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch|Compression according to Lempel, Ziv and Welch]].
 +
*In particular, reference is made to the pages
 +
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Remaining_redundancy_as_a_measure_for_the_efficiency_of_coding_methods|Remaining redundancy as a measure for the efficiency of coding methods]],
 +
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Efficiency_of_Lempel-Ziv_coding|Efficiency of Lempel-Ziv coding]] and
 +
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Quantitative_statements_on_asymptotic_optimality|Quantitative statements on asymptotic optimality]].
 +
*The descriptive variables&nbsp;  $K(N)$&nbsp;  and&nbsp;  $r(N)$&nbsp;  are deterministically related.
 +
  
  
===Fragebogen===
+
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
{Mit welchem Parameter <i>A</i> wurde die Näherung <i>r'</i>(<i>N</i>) der Restredundanz für die Binärquelle BQ1 erstellt?
+
{With which parameter&nbsp; $A$&nbsp; was the approximation&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; of the residual redundancy for the binary source&nbsp; $\rm BQ1$&nbsp; created?
 
|type="{}"}
 
|type="{}"}
$BQ1:\ A$ = { 1.06 3% }
+
$A \ = \ $ { 1.06 3% }
  
  
{Wie groß muss <i>N</i> = <i>N</i><sub>b</sub> mindestens sein, damit die Restredundanz die Bedingung <i>r</i>(<i>N</i>) &#8804; 5% erfüllt? <i>Hinweis:</i> Ersetzen Sie <i>r</i>(<i>N</i>) durch <i>r'</i><sub> </sub>(<i>N</i>).
+
{What is the minimum size of&nbsp; $N = N_2$&nbsp; for&nbsp;  $\rm BQ1$&nbsp;  so that the residual redundancy satisfies the condition&nbsp; $r(N) &asymp; r\hspace{0.05cm}'(N) \le 5\%$&nbsp;?
 
|type="{}"}
 
|type="{}"}
$BQ1:\ N_b$ = { 1.58 3% } $\cdot 10^{21}$
+
$N_{2} \ =  \ $ { 1.58 3% } $\ \cdot 10^{21}$
  
  
{Wie groß muss <i>N</i> = <i>N</i><sub>c</sub> mindestens sein, damit der Komprimierungsfaktor &nbsp;&#8658;&nbsp;&nbsp; <i>K</i>(<i>N</i>) = <i>L</i>(<i>N</i>)/<i>N</i> nicht größer ist als 0.6?
+
{What is the minimum size of&nbsp; $N = N_3$&nbsp; at&nbsp; $\rm BQ1$&nbsp; so that the compression factor&nbsp; $K(N)= L(N)/N$&nbsp; is not greater than&nbsp; $0.6$?
 
|type="{}"}
 
|type="{}"}
$BQ1:\ N_c$ = { 2.29 3% } $\cdot 10^{6}$
+
$N_{3} \ =  \ $ { 2.29 3% } $\ \cdot 10^{6}$
  
  
{Bestimmen Sie nun die Redundanznäherung <i>r'</i>(<i>N</i>) für die redundanzfreie Binärquelle BQ2, insbesondere:
+
{Now determine the redundancy approximation&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; for the redundancy-free binary source $\rm BQ2$, in particular:
 
|type="{}"}
 
|type="{}"}
$BQ2:\ r'(N = 50000)$ = { 0.162 3% }
+
$r'(N = 50000)\ =  \ $ { 0.162 3% }
$r'(N = 10^6)$ = { 0.127 3% }
+
$r'(N = 10^6)\ =  \ $ { 0.127 3% }
$r'(N = 10^12)$ = { 0.063 3% }
+
$r'(N = 10^{12})\ =  \ $ { 0.063 3% }
  
  
{Welche Werte liefert die Redundanznäherung <i>r</i>&prime;(<i>N</i>) für die nicht näher spezifizierte Binärquelle BQ3? Insbesondere:
+
{What values does the redundancy approximation&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; yield for the unspecified binary source&nbsp; $\rm BQ3$? In particular:
 
|type="{}"}
 
|type="{}"}
$BQ2:\ r'(N = 50000)$ = { 0.289 3% }
+
$r'(N = 50000)\ =  \ $ { 0.289 3% }
$r'(N = 10^6)$ = { 0.227 3% }
+
$r'(N = 10^6)\ =  \ $ { 0.227 3% }
$r'(N = 10^12)$ = { 0.113 3% }
+
$r'(N = 10^{12})\ =  \ $ { 0.113 3% }
  
  
{Welche Quellenentropie <i>H</i> könnte BQ3 nach diesem Ergebnis besitzen? <i>Hinweis:</i> Es ist genau eine Antwort richtig.
+
{According to this result, what source entropy&nbsp; $H$&nbsp; could&nbsp; $\rm BQ3$&nbsp; ?&nbsp; <u>Hint:</u>&nbsp; Exactly one answer is correct.
|type="[]"}
+
|type="()"}
- <i>H</i> = 1.00 bit/Quellensymbol,
+
- $H = 1.00 \ \rm bit/source\hspace{0.15cm}symbol$,
- <i>H</i> = 0.75 bit/Quellensymbol,
+
- $H = 0.75 \ \rm bit/source\hspace{0.15cm} symbol$,
- <i>H</i> = 0.50 bit/Quellensymbol,
+
- $H = 0.50 \ \rm bit/source\hspace{0.15cm} symbol$,
+ <i>H</i> = 0.25 bit/Quellensymbol.
+
+ $H = 0.25 \ \rm bit/source\hspace{0.15cm} symbol$.
  
  
Line 88: Line 107:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
:<b>1.</b>&nbsp;&nbsp;Die Näherung <i>r'</i><sup> </sup>(<i>N</i>) stimmt definitionsgemäß für die Eingangsfolgenlänge <i>N</i> = 10000 mit der per Simulation ermittelten Restredundanz <i>r</i>(<i>N</i>) = 0.265 exakt überein. Damit ist
+
'''(1)'''&nbsp; The approximation&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; agrees exactly by definition for the sequence length&nbsp; $N = 10000$&nbsp; with the residual redundancy&nbsp; $r(N) = 0.265$&nbsp; determined by simulation.
 +
*Thus
 
:$$A = 4 \cdot  r(N = 10000) =4 \cdot {0.265} \hspace{0.15cm}\underline{= 1.06}
 
:$$A = 4 \cdot  r(N = 10000) =4 \cdot {0.265} \hspace{0.15cm}\underline{= 1.06}
 
\hspace{0.05cm}. $$
 
\hspace{0.05cm}. $$
  
:<b>2.</b>&nbsp;&nbsp;Aus der Beziehung <i>A</i>/lg (<i>N</i>) &#8804; 0.05 &nbsp;&nbsp;&nbsp;&#8658;&nbsp;&nbsp;&nbsp; <i>A</i>/lg (<i>N</i><sub>b</sub>) = 0.05 folgt:
+
 
:$${{\rm lg}\hspace{0.1cm}N_{\rm b}} = \frac{A}{0.05} = 21.2 \hspace{0.3cm}\Rightarrow\hspace{0.3cm}
+
 
N_{\rm b} = 10^{21.2} \hspace{0.15cm}\underline{= 1.58 \cdot 10^{21}}
+
'''(2)'''&nbsp; From the relationship&nbsp; ${A}/{\rm lg}\hspace{0.1cm}(N) &#8804; 0.05$ &nbsp;&nbsp;&nbsp;&#8658;&nbsp;&nbsp;&nbsp; ${A}/{\rm lg}\hspace{0.1cm}(N) = 0.05$&nbsp; it follows:
 +
:$${{\rm lg}\hspace{0.1cm}N_{\rm 2}} = \frac{A}{0.05} = 21.2 \hspace{0.3cm}\Rightarrow\hspace{0.3cm}
 +
N_{\rm 2} = 10^{21.2} \hspace{0.15cm}\underline{= 1.58 \cdot 10^{21}}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
:<b>3.</b>&nbsp;&nbsp;Allgemein gilt:
+
 
:$$r(N) = 1 - \frac{H}{K(N)}  
+
 
\hspace{0.05cm}. $$
+
'''(3)'''&nbsp; In general, &nbsp;$r(N) = 1 - {H}/{K(N)} \hspace{0.05cm}.$&nbsp;
:BQ1 hat die Entropie <i>H</i> = 0.5 bit/Symbol. Daraus folgt wegen <i>r</i>(<i>N</i>) &asymp; <i>r'</i><sup> </sup>(<i>N</i>) für <i>K</i>(<i>N</i><sub>c</sub>) = 0.6:
+
*$\rm BQ1$&nbsp; has entropy&nbsp; $H = 0.5$ bit/source symbol.&nbsp;
:$$r(N_{\rm c}) = 1 - \frac{0.5}{0.6} = 0.167 \hspace{0.1cm}\Rightarrow\hspace{0.1cm}  
+
*It follows that because &nbsp;$r(N) &asymp; r\hspace{0.05cm}'(N)$&nbsp; for&nbsp; $K(N_3) = 0.6$:
{\rm lg}\hspace{0.1cm}N_{\rm c} = \frac{A}{0.167} = 6.36
+
:$$r(N_{\rm 3}) = 1 - \frac{0.5}{0.6} = 0.167 \hspace{0.1cm}\Rightarrow\hspace{0.1cm}  
 +
{\rm lg}\hspace{0.1cm}N_{\rm 3} = \frac{A}{0.167} = 6.36
 
\hspace{0.1cm}\Rightarrow\hspace{0.1cm}
 
\hspace{0.1cm}\Rightarrow\hspace{0.1cm}
N_{\rm c} = 10^{6.36} \hspace{0.15cm}\underline{= 2.29 \cdot 10^{6}}
+
N_{\rm 3} = 10^{6.36} \hspace{0.15cm}\underline{= 2.29 \cdot 10^{6}}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
[[File:P_ID2447__Inf_A_2_5d.png|right|]]
 
  
:<b>4.</b>&nbsp;&nbsp;Für <i>N</i> = 10000 gilt <i>r</i>&prime;(<i>N</i>) = <i>r</i>(<i>N</i>) = 0.190:
+
 
 +
 
 +
[[File:EN_Inf_A_2_5d_v2.png|right|frame|Results for&nbsp;  $\rm BQ2$]]
 +
'''(4)'''&nbsp; For&nbsp; $N = 10000$ &nbsp; &rArr; &nbsp; $r(N) &asymp; r\hspace{0.05cm}'(N) = 0.19$:
 
:$$\frac{A}{{\rm lg}\hspace{0.1cm}10000} = 0.19 \hspace{0.3cm}\Rightarrow\hspace{0.3cm}
 
:$$\frac{A}{{\rm lg}\hspace{0.1cm}10000} = 0.19 \hspace{0.3cm}\Rightarrow\hspace{0.3cm}
 
A = 0.19 \cdot 4 = 0.76 \hspace{0.05cm}. $$
 
A = 0.19 \cdot 4 = 0.76 \hspace{0.05cm}. $$
:Die Ergebnisse sind in nebenstehender Tabelle zusammengefasst. Man erkennt die sehr gute Übereinstimmung zwischen <i>r</i>(<i>N</i>) und <i>r</i>&prime;(<i>N</i>). Die gesuchten Zahlenwerte sind in der Tabelle rot markiert.
+
*The results are summarised in the table opposite.
 +
*One can see the very good agreement between&nbsp; &nbsp;$r(N)$&nbsp; and  &nbsp;$r\hspace{0.05cm}'(N)$.
 +
*The numerical values sought are marked in red in the table:
 +
$$r'(N = 50000)\hspace{0.15cm}\underline{ = 0.162},\hspace{0.3cm}r'(N = 10^{6})\hspace{0.15cm}\underline{ = 0.127},\hspace{0.3cm}
 +
r'(N = 10^{12})\hspace{0.15cm}\underline{ = 0.063}.$$
 +
 
 +
*For the compression factor &nbsp; &ndash; &nbsp; the apostrophe indicates that the approximation &nbsp;$r\hspace{0.05cm}'(N)$&nbsp; was assumed:
 +
:$$K\hspace{0.05cm}'(N) = \frac{1}{1 - r\hspace{0.05cm}'(N)}\hspace{0.05cm}.$$
 +
*Thus, for the length of the LZW output string:
 +
:$$L\hspace{0.05cm}'(N) = K\hspace{0.05cm}'(N) \cdot N  = \frac{N}{1 - r\hspace{0.05cm}'(N)}\hspace{0.05cm}.$$
 +
 
  
:Für den Komprimierungsfaktor gilt (der Apostroph weist darauf hin, dass von der Näherung <i>r</i>&prime;(<i>N</i>) ausgegangen wurde):
 
:$$K'(N) = \frac{1}{1 - r'(N)}\hspace{0.05cm}.$$
 
:Damit gilt für die Länge des LZW&ndash;Ausgabestrings:
 
:$$L'(N) = K'(N) \cdot N \hspace{0.05cm}.$$
 
  
:<br><br><br>
+
[[File:EN_Inf_A_2_5e_v2.png|right|frame|Results for&nbsp; $\rm BQ3$]]
:<b>5.</b>&nbsp;&nbsp;Nach ähnlicher Vorgehensweise wie in der Teilaufgabe d) erhält man für die Binärquelle BQ3 den Anpassungsparameter <i>A</i> = 1.36 und daraus die Ergebnisse gemäß der blau hinterlegten Tabelle.
+
'''(5)'''&nbsp; Following a similar procedure as in subtask&nbsp; '''(4)'''&nbsp; we obtain the fitting parameter&nbsp; $A = 1.36$&nbsp; for the binary source&nbsp; $\rm BQ3$&nbsp; and from this the results according to the table with a blue background.
  
:<u>Hinweis:</u> Die letzte Spalte dieser Tabelle ist nur bei Kenntnis der Teilaufgabe (f) verständlich. Dort wird gezeigt, dass die  Quelle BQ3 die Entropie  <i>H</i> = 0.25 bit/Quellensymbol besitzt.
+
<u>Hint:</u> &nbsp; The last column of this table is only understandable with knowledge of subtask&nbsp; '''(6)'''.&nbsp; There it is shown that the source&nbsp; $\rm BQ3$&nbsp; has the entropy&nbsp; $H = 0.25$&nbsp; bit/source symbol.
  
:In diesem Fall gilt für den Komprimierungsfaktor:
+
*In this case, the following applies to the compression factor:
:$$K'(N) = \frac{H}{1 - r'(N)} = \frac{0.25}{1 - r'(N)} \hspace{0.05cm}.$$
+
:$$K\hspace{0.05cm}'(N) = \frac{H}{1 - r\hspace{0.05cm}'(N)} = \frac{0.25}{1 - r'(N)} \hspace{0.05cm}.$$
[[File:P_ID2448__Inf_A_2_5e.png|right|]]
+
*Thus, for the values of residual redundancy we are looking for, we obtain:
:Für <i>N</i> = 10<sup>12</sup> weicht also der Komprimierungsfaktor (0.282) noch deutlich von der Entropie (0.25) ab, die  für <i>N</i> &#8594; &#8734; erreicht werden kann (Quellencodierungstheorem).<br><br><br>
+
:$$r\hspace{0.05cm}'(N = 50000)\hspace{0.15cm}\underline{ = 0.289},\hspace{0.3cm}r\hspace{0.05cm}'(N = 10^{6})\hspace{0.15cm}\underline{ = 0.227},\hspace{0.3cm}
 +
r\hspace{0.05cm}'(N = 10^{12})\hspace{0.15cm}\underline{ = 0.113}.$$
 +
*Thus, for&nbsp; $N = 10^{12}$&nbsp; , the compression factor&nbsp; $(0.282)$&nbsp; still deviates significantly from the entropy&nbsp; $(0.25)$&nbsp; which can only be achieved for&nbsp; $N \to \infty$&nbsp; ("Source Coding Theorem").
  
:<b>6.</b>&nbsp;&nbsp;Die einzelnen Näherungen <i>r</i>&prime;(<i>N</i>) unterscheiden sich nur durch den Parameter <i>A</i>. Dabei haben wir festgestellt:
 
  
:* Quelle BQ2 (<i>H</i> = 1.00): <i>A</i> = 0.76,
 
  
:* Quelle BQ1 (<i>H</i> = 0.50): <i>A</i> = 1.06,
 
  
:* Quelle BQ3 (<i>H</i> unbekannt): <i>A</i> = 1.36.
+
'''(6)'''&nbsp; The individual approximations&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; differ only by the parameter&nbsp; $A$.&nbsp; Here we found:
 +
# Source&nbsp; $\rm BQ1$&nbsp; with&nbsp; $H = 0.50$ &nbsp; &rArr; &nbsp; $A = 1.06$ &nbsp; &rArr; &nbsp; according to the specification sheet,
 +
# Source&nbsp; $\rm BQ2$&nbsp; with&nbsp; $H = 1.00$ &nbsp; &rArr; &nbsp; $A = 0.76$ &nbsp; &rArr; &nbsp; see subtask&nbsp; '''(4)''',
 +
# Source&nbsp; $\rm BQ3$&nbsp; $(H$ unknown$)$: $A = 4 &middot; 0.34 =1.36$ &nbsp; &rArr; &nbsp; corresponding to the table on the right.
  
:Je kleiner die Entropie <i>H</i> ist, um so größer ist offensichtlich der Anpassungsfaktor <i>A</i>. Da genau eine Lösung möglich ist, muss <i>H</i> = 0.25 bit/Quellensymbol richtig sein &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <u>Antwort 4</u>.
+
*Obviously, the smaller the entropy&nbsp; $H$&nbsp; the larger the adjustment factor&nbsp; $A$&nbsp; (and vice versa).  
 +
*Since exactly one solution is possible, &nbsp; $H = 0.25$&nbsp; bit/source&nbsp; symbol must be correct &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <u>answer 4</u>.
  
:Tatsächlich wurden bei der Simulation für die Quelle Q3 die Wahrscheinlichkeiten <i>p</i><sub>A</sub> = 0.96, <i>p</i><sub>B</sub> = 0.04 &#8658; <i>H</i> &asymp; 0.25 verwendet.<br><br>
+
*In fact, the probabilities&nbsp;$p_{\rm A} = 0.96$&nbsp; and &nbsp;$p_{\rm B} = 0.04$&nbsp; &nbsp; &#8658; &nbsp; $H &asymp; 0.25$ were used in the simulation for the source&nbsp; $\rm BQ3$.
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  
  
  
[[Category:Aufgaben zu Informationstheorie|^2.2 Verfahren von Lempel, Ziv, Welch^]]
+
[[Category:Information Theory: Exercises|^2.2 Lempel-Ziv-Welch Compression^]]

Latest revision as of 14:12, 10 August 2021

Residual redundancy  $r(N)$  and approximation   $r\hspace{0.05cm}'(N)$  of three sources

We assume here a binary input sequence of length  $N$  and consider three different binary sources:

  • $\rm BQ1$:   Symbol probabilities  $p_{\rm A} = 0.89$  and  $p_{\rm B} = 0.11$, i.e. different
      ⇒   entropy   $H = 0.5\text{ bit/source symbol}$   ⇒   the source is redundant.
  • $\rm BQ2$:   $p_{\rm A} = p_{\rm B} = 0.5$  (equally probable)
      ⇒   entropy   $H = 1\text{ bit/source symbol}$   ⇒   the source is redundancy-free.
  • $\rm BQ3$:   There is no concrete information on the statistics here. 
    In subtask  (6)  you are to estimate the entropy  $H$  of this source.


For these three sources, the respective  "residual redundancy"  $r(N)$  was determined by simulation, which remains in the binary sequence after  Lempel–Ziv–Welch coding

The results are shown in the first column of the above table for the sources

  • $\rm BQ1$  (yellow background),
  • $\rm BQ2$  (green background) and
  • $\rm BQ3$  (blue background)


whereby we have restricted ourselves to sequence lengths  $N ≤ 50000$  in the simulation.

The  "relative redundancy of the output sequence" – simplified called  "residual redundancy" – can be calculated from

  • the length  $N$  of the input sequence,
  • the length  $L(N)$  of the output sequence and
  • the entropy  $H$


in the following way:

$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 - \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$

This takes into account that with perfect source coding the length of the output sequence could be lowered to the value  $L_{\rm min} = N · H$ .

  • With non-perfect source coding,  $L(n) - N · H$  gives the remaining redundancy  (with the pseudo–unit  "bit").
  • After dividing by  $L(n)$,  one obtains the relative redundancy  $r(n)$  with the value range between zero and one;  $r(n)$  should be as small as possible.


A second parameter for measuring the efficiency of LZW coding is the  "compression factor"  $K(N)$  as the quotient of the lengths of the output and input sequences, which should also be very small:

$$K(N) = {L(N) }/{N} \hspace{0.05cm},$$

In the  theory section  it was shown that the residual redundancy  $r(n)$  is well approximated by the function

$$r\hspace{0.05cm}'(N) =\frac {A}{{\rm lg}\hspace{0.1cm}(N)} \hspace{0.5cm}{\rm mit}\hspace{0.5cm} A = 4 \cdot {r(N = 10000)} \hspace{0.05cm}.$$


  • This approximation  $r\hspace{0.05cm}'(N)$  is given for  $\rm BQ1$  in the second column of the table above.
  • In subtasks  (4)  and  (5)  you are to make the approximation for sources  $\rm BQ2$  and  $\rm BQ3$ .





Hints:

Remaining redundancy as a measure for the efficiency of coding methods,
Efficiency of Lempel-Ziv coding and
Quantitative statements on asymptotic optimality.
  • The descriptive variables  $K(N)$  and  $r(N)$  are deterministically related.


Questions

1

With which parameter  $A$  was the approximation  $r\hspace{0.05cm}'(N)$  of the residual redundancy for the binary source  $\rm BQ1$  created?

$A \ = \ $

2

What is the minimum size of  $N = N_2$  for  $\rm BQ1$  so that the residual redundancy satisfies the condition  $r(N) ≈ r\hspace{0.05cm}'(N) \le 5\%$ ?

$N_{2} \ = \ $

$\ \cdot 10^{21}$

3

What is the minimum size of  $N = N_3$  at  $\rm BQ1$  so that the compression factor  $K(N)= L(N)/N$  is not greater than  $0.6$?

$N_{3} \ = \ $

$\ \cdot 10^{6}$

4

Now determine the redundancy approximation  $r\hspace{0.05cm}'(N)$  for the redundancy-free binary source $\rm BQ2$, in particular:

$r'(N = 50000)\ = \ $

$r'(N = 10^6)\ = \ $

$r'(N = 10^{12})\ = \ $

5

What values does the redundancy approximation  $r\hspace{0.05cm}'(N)$  yield for the unspecified binary source  $\rm BQ3$? In particular:

$r'(N = 50000)\ = \ $

$r'(N = 10^6)\ = \ $

$r'(N = 10^{12})\ = \ $

6

According to this result, what source entropy  $H$  could  $\rm BQ3$  ?  Hint:  Exactly one answer is correct.

$H = 1.00 \ \rm bit/source\hspace{0.15cm}symbol$,
$H = 0.75 \ \rm bit/source\hspace{0.15cm} symbol$,
$H = 0.50 \ \rm bit/source\hspace{0.15cm} symbol$,
$H = 0.25 \ \rm bit/source\hspace{0.15cm} symbol$.


Solution

(1)  The approximation  $r\hspace{0.05cm}'(N)$  agrees exactly by definition for the sequence length  $N = 10000$  with the residual redundancy  $r(N) = 0.265$  determined by simulation.

  • Thus
$$A = 4 \cdot r(N = 10000) =4 \cdot {0.265} \hspace{0.15cm}\underline{= 1.06} \hspace{0.05cm}. $$


(2)  From the relationship  ${A}/{\rm lg}\hspace{0.1cm}(N) ≤ 0.05$    ⇒    ${A}/{\rm lg}\hspace{0.1cm}(N) = 0.05$  it follows:

$${{\rm lg}\hspace{0.1cm}N_{\rm 2}} = \frac{A}{0.05} = 21.2 \hspace{0.3cm}\Rightarrow\hspace{0.3cm} N_{\rm 2} = 10^{21.2} \hspace{0.15cm}\underline{= 1.58 \cdot 10^{21}} \hspace{0.05cm}.$$


(3)  In general,  $r(N) = 1 - {H}/{K(N)} \hspace{0.05cm}.$ 

  • $\rm BQ1$  has entropy  $H = 0.5$ bit/source symbol. 
  • It follows that because  $r(N) ≈ r\hspace{0.05cm}'(N)$  for  $K(N_3) = 0.6$:
$$r(N_{\rm 3}) = 1 - \frac{0.5}{0.6} = 0.167 \hspace{0.1cm}\Rightarrow\hspace{0.1cm} {\rm lg}\hspace{0.1cm}N_{\rm 3} = \frac{A}{0.167} = 6.36 \hspace{0.1cm}\Rightarrow\hspace{0.1cm} N_{\rm 3} = 10^{6.36} \hspace{0.15cm}\underline{= 2.29 \cdot 10^{6}} \hspace{0.05cm}.$$


Results for  $\rm BQ2$

(4)  For  $N = 10000$   ⇒   $r(N) ≈ r\hspace{0.05cm}'(N) = 0.19$:

$$\frac{A}{{\rm lg}\hspace{0.1cm}10000} = 0.19 \hspace{0.3cm}\Rightarrow\hspace{0.3cm} A = 0.19 \cdot 4 = 0.76 \hspace{0.05cm}. $$
  • The results are summarised in the table opposite.
  • One can see the very good agreement between   $r(N)$  and  $r\hspace{0.05cm}'(N)$.
  • The numerical values sought are marked in red in the table:

$$r'(N = 50000)\hspace{0.15cm}\underline{ = 0.162},\hspace{0.3cm}r'(N = 10^{6})\hspace{0.15cm}\underline{ = 0.127},\hspace{0.3cm} r'(N = 10^{12})\hspace{0.15cm}\underline{ = 0.063}.$$

  • For the compression factor   –   the apostrophe indicates that the approximation  $r\hspace{0.05cm}'(N)$  was assumed:
$$K\hspace{0.05cm}'(N) = \frac{1}{1 - r\hspace{0.05cm}'(N)}\hspace{0.05cm}.$$
  • Thus, for the length of the LZW output string:
$$L\hspace{0.05cm}'(N) = K\hspace{0.05cm}'(N) \cdot N = \frac{N}{1 - r\hspace{0.05cm}'(N)}\hspace{0.05cm}.$$


Results for  $\rm BQ3$

(5)  Following a similar procedure as in subtask  (4)  we obtain the fitting parameter  $A = 1.36$  for the binary source  $\rm BQ3$  and from this the results according to the table with a blue background.

Hint:   The last column of this table is only understandable with knowledge of subtask  (6).  There it is shown that the source  $\rm BQ3$  has the entropy  $H = 0.25$  bit/source symbol.

  • In this case, the following applies to the compression factor:
$$K\hspace{0.05cm}'(N) = \frac{H}{1 - r\hspace{0.05cm}'(N)} = \frac{0.25}{1 - r'(N)} \hspace{0.05cm}.$$
  • Thus, for the values of residual redundancy we are looking for, we obtain:
$$r\hspace{0.05cm}'(N = 50000)\hspace{0.15cm}\underline{ = 0.289},\hspace{0.3cm}r\hspace{0.05cm}'(N = 10^{6})\hspace{0.15cm}\underline{ = 0.227},\hspace{0.3cm} r\hspace{0.05cm}'(N = 10^{12})\hspace{0.15cm}\underline{ = 0.113}.$$
  • Thus, for  $N = 10^{12}$  , the compression factor  $(0.282)$  still deviates significantly from the entropy  $(0.25)$  which can only be achieved for  $N \to \infty$  ("Source Coding Theorem").



(6)  The individual approximations  $r\hspace{0.05cm}'(N)$  differ only by the parameter  $A$.  Here we found:

  1. Source  $\rm BQ1$  with  $H = 0.50$   ⇒   $A = 1.06$   ⇒   according to the specification sheet,
  2. Source  $\rm BQ2$  with  $H = 1.00$   ⇒   $A = 0.76$   ⇒   see subtask  (4),
  3. Source  $\rm BQ3$  $(H$ unknown$)$: $A = 4 · 0.34 =1.36$   ⇒   corresponding to the table on the right.
  • Obviously, the smaller the entropy  $H$  the larger the adjustment factor  $A$  (and vice versa).
  • Since exactly one solution is possible,   $H = 0.25$  bit/source  symbol must be correct   ⇒   answer 4.
  • In fact, the probabilities $p_{\rm A} = 0.96$  and  $p_{\rm B} = 0.04$    ⇒   $H ≈ 0.25$ were used in the simulation for the source  $\rm BQ3$.