Difference between revisions of "Aufgaben:Exercise 2.5: Residual Redundancy with LZW Coding"

From LNTwww
m (Text replacement - "”" to """)
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Komprimierung nach Lempel, Ziv und Welch
+
{{quiz-Header|Buchseite=Information_Theory/Compression_According_to_Lempel,_Ziv_and_Welch
 
}}
 
}}
  
[[File:P_ID2446__Inf_A_2_5_neu.png|right|frame|Restredundanz  $r(N)$  & Näherung  $r\hspace{0.05cm}'(N)$  dreier Binärquellen]]
+
[[File:P_ID2446__Inf_A_2_5_neu.png|right|frame|Residual redundancy  $r(N)$  and approximation   $r\hspace{0.05cm}'(N)$  of three sources]]
Wir gehen hier von einer binären Eingangsfolge der Länge  $N$  aus und  betrachten drei verschiedene binäre Nachrichtenquellen:
+
We assume here a binary input sequence of length  $N$  and consider three different binary sources:
* $\rm BQ1$: &nbsp; Symbolwahrscheinlichkeiten&nbsp;  $p_{\rm A} = 0.89$&nbsp; und&nbsp; $p_{\rm B} = 0.11$, also unterschiedlich<br> &nbsp;  &#8658; &nbsp; Entropie&nbsp; $H = 0.5\text{ bit/Quellensymbol}$ &nbsp; &#8658; &nbsp; die Quelle ist redundant.
+
* $\rm BQ1$: &nbsp; Symbol probabilities&nbsp;  $p_{\rm A} = 0.89$&nbsp; and&nbsp; $p_{\rm B} = 0.11$, i.e. different<br> &nbsp;  &#8658; &nbsp; entropy &nbsp; $H = 0.5\text{ bit/source&nbsp;symbol}$ &nbsp; &#8658; &nbsp; the source is redundant.
* $\rm BQ2$:  &nbsp; $p_{\rm A} = p_{\rm B} = 0.5$&nbsp; (gleichwahrscheinlich)<br> &nbsp; &#8658; &nbsp; Entropie&nbsp; $H = 1\text{ bit/Quellensymbol}$ &nbsp; &#8658; &nbsp; die Quelle ist redundanzfrei.
+
* $\rm BQ2$:  &nbsp; $p_{\rm A} = p_{\rm B} = 0.5$&nbsp; (equally probable)<br> &nbsp; &#8658; &nbsp; entropy &nbsp; $H = 1\text{ bit/source symbol}$ &nbsp; &#8658; &nbsp; the source is redundancy-free.
* $\rm BQ3$: &nbsp;  Hier gibt es keine konkreten Angaben zur Statistik.&nbsp; <br>In der Teilaufgabe&nbsp; '''(6)'''&nbsp; sollen Sie die Entropie&nbsp; $H$&nbsp; dieser Quelle  abschätzen.
+
* $\rm BQ3$: &nbsp;  There is no concrete information on the statistics here.&nbsp; <br>In subtask&nbsp; '''(6)'''&nbsp; you are to estimate the entropy&nbsp; $H$&nbsp; of this source.
  
  
Für diese drei Quellen wurden per Simulation die jeweilige <i>Restredundanz</i>&nbsp; $r(N)$&nbsp; ermittelt, die nach der&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Der_Lempel.E2.80.93Ziv.E2.80.93Welch.E2.80.93Algorithmus|Lempel&ndash;Ziv&ndash;Welch&ndash;Codierung]]&nbsp; in der Binärfolge verbleibt.&nbsp;  
+
For these three sources, the respective&nbsp; "residual redundancy"&nbsp; $r(N)$&nbsp; was determined by simulation, which remains in the binary sequence after&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#The_Lempel-Ziv-Welch_algorithm |Lempel&ndash;Ziv&ndash;Welch coding]].&nbsp;  
  
Die Ergebnisse sind in der jeweils ersten Spalte obiger Tabelle für die Quellen
+
The results are shown in the first column of the above table for the sources
*$\rm BQ1$&nbsp; (gelbe Hinterlegung),  
+
*$\rm BQ1$&nbsp; (yellow background),  
*$\rm BQ2$&nbsp; (grüne Hinterlegung) und
+
*$\rm BQ2$&nbsp; (green background) and
*$\rm BQ3$&nbsp; (blaue Hinterlegung)  
+
*$\rm BQ3$&nbsp; (blue background)  
  
  
eingetragen, wobei wir uns bei der Simulation auf Folgenlängen&nbsp; $N &#8804; 50000$&nbsp; beschränkt haben.
+
whereby we have restricted ourselves to sequence lengths&nbsp; $N &#8804; 50000$&nbsp; in the simulation.
  
Die <i>relative Redundanz der Ausgangsfolge</i> &ndash; vereinfachend <i>Restredundanz</i> genannt &ndash;  kann aus
+
The&nbsp; "relative redundancy of the output sequence" &ndash; simplified called&nbsp; "residual redundancy" &ndash;  can be calculated from
* der Länge&nbsp;  $N$&nbsp;  der Eingangsfolge,
+
* the length&nbsp;  $N$&nbsp;  of the input sequence,
* der Länge&nbsp;  $L(N)$&nbsp;  der Ausgangsfolge und
+
* the length&nbsp;  $L(N)$&nbsp;  of the output sequence and
*der Entropie&nbsp;  $H$
+
*the entropy&nbsp;  $H$
  
  
in folgender Weise berechnet werden:
+
in the following way:
 
:$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 -  \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$
 
:$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 -  \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$
  
Hierbei ist berücksichtigt, dass bei perfekter Quellencodierung die Länge der Ausgangsfolge bis auf den Wert&nbsp;  $L_{\rm min} = N &middot; H$&nbsp; herabgesenkt werden könnte.  
+
This takes into account that with perfect source coding the length of the output sequence could be lowered to the value&nbsp;  $L_{\rm min} = N &middot; H$&nbsp;.
*Bei nichtperfekter Quellencodierung  gibt&nbsp;  $L(n) - N &middot; H$&nbsp;  die verbleibende Redundanz (mit der Pseudo&ndash;Einheit "bit") an.  
+
*With non-perfect source coding,&nbsp;  $L(n) - N &middot; H$&nbsp;  gives the remaining redundancy&nbsp; (with the pseudo&ndash;unit&nbsp; "bit").  
*Nach Division durch&nbsp;  $L(n)$&nbsp;  erhält man die relative Redundanz&nbsp;  $r(n)$&nbsp;  mit dem Wertebereich zwischen Null und Eins; $r(n)$ sollte möglichst klein sein.
+
*After dividing by&nbsp;  $L(n)$,&nbsp;  one obtains the relative redundancy&nbsp;  $r(n)$&nbsp;  with the value range between zero and one;&nbsp; $r(n)$&nbsp; should be as small as possible.
  
  
Eine zweite Kenngröße zur Effizienzmessung der LZW&ndash;Codierung ist der&nbsp;  <i>Komprimierungsfaktor</i>&nbsp;  $K(N)$, der als der Quotient der Längen von Ausgangs&ndash; und Eingangsfolge ebenfalls klein sein sollte:
+
A second parameter for measuring the efficiency of LZW coding is the&nbsp;  "compression factor"&nbsp;  $K(N)$&nbsp; as the quotient of the lengths of the output and input sequences, which should also be very small:
 +
 
 
:$$K(N) = {L(N) }/{N} \hspace{0.05cm},$$
 
:$$K(N) = {L(N) }/{N} \hspace{0.05cm},$$
  
Im&nbsp;  [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Quantitative_Aussagen_zur_asymptotischen_Optimalit.C3.A4t|Theorieteil]]&nbsp;  wurde gezeigt, dass die Restredundanz&nbsp;  $r(n)$&nbsp;  oft durch die Funktion
+
In the&nbsp;  [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Quantitative_statements_on_asymptotic_optimality|theory section]]&nbsp;  it was shown that the residual redundancy&nbsp;  $r(n)$&nbsp;  is well approximated by the function
 
:$$r\hspace{0.05cm}'(N) =\frac {A}{{\rm lg}\hspace{0.1cm}(N)}
 
:$$r\hspace{0.05cm}'(N) =\frac {A}{{\rm lg}\hspace{0.1cm}(N)}
 
\hspace{0.5cm}{\rm mit}\hspace{0.5cm} A = 4 \cdot {r(N = 10000)}  
 
\hspace{0.5cm}{\rm mit}\hspace{0.5cm} A = 4 \cdot {r(N = 10000)}  
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
gut angenähert wird.
 
  
*Diese Näherung&nbsp;  $r\hspace{0.05cm}'(N)$&nbsp;  ist für&nbsp;  $\rm BQ1$&nbsp;  in der zweiten Spalte obiger Tabelle angegeben.
 
* In den Teilaufgaben&nbsp;  '''(4)'''&nbsp;  und&nbsp;  '''(5)'''&nbsp;  sollen Sie die Approximation für die Quellen&nbsp;  $\rm BQ2$&nbsp;  und&nbsp;  $\rm BQ3$&nbsp;  vornehmen.
 
  
 +
*This approximation&nbsp;  $r\hspace{0.05cm}'(N)$&nbsp;  is given for&nbsp;  $\rm BQ1$&nbsp;  in the second column of the table above.
 +
* In subtasks&nbsp;  '''(4)'''&nbsp;  and&nbsp;  '''(5)'''&nbsp;  you are to make the approximation for sources&nbsp;  $\rm BQ2$&nbsp;  and&nbsp;  $\rm BQ3$&nbsp;.
  
  
Line 53: Line 53:
  
  
''Hinweise:''
+
 
*Die Aufgabe gehört zum  Kapitel&nbsp;  [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch|Komprimierung nach Lempel, Ziv und Welch]].
+
Hints:
*Insbesondere wird  Bezug genommen auf die Seiten
+
*The task belongs to the chapter&nbsp;  [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch|Compression according to Lempel, Ziv and Welch]].
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Restredundanz_als_Ma.C3.9F_f.C3.BCr_die_Effizienz_von_Codierverfahren|Restredrundanz als Maß für die Effizienz von Codierverfahren]],
+
*In particular, reference is made to the pages
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Effizienz_der_Lempel.E2.80.93Ziv.E2.80.93Codierung|Effizienz der Lempel-Ziv-Codierung]] sowie
+
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Remaining_redundancy_as_a_measure_for_the_efficiency_of_coding_methods|Remaining redundancy as a measure for the efficiency of coding methods]],
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Quantitative_Aussagen_zur_asymptotischen_Optimalit.C3.A4t|Quantitative Aussagen zur asymptotischen Optimalität]].   
+
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Efficiency_of_Lempel-Ziv_coding|Efficiency of Lempel-Ziv coding]] and
*Die Beschreibungsgrößen&nbsp;  $K(N)$&nbsp;  und&nbsp;  $r(N)$&nbsp;  hängen deterministisch zusammen.
+
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Quantitative_statements_on_asymptotic_optimality|Quantitative statements on asymptotic optimality]].   
 +
*The descriptive variables&nbsp;  $K(N)$&nbsp;  and&nbsp;  $r(N)$&nbsp;  are deterministically related.
 
   
 
   
  
  
===Fragebogen===
+
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
{Mit welchem Parameter&nbsp; $A$&nbsp; wurde die Näherung&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; der Restredundanz für die Binärquelle&nbsp; $\rm BQ1$&nbsp; erstellt?
+
{With which parameter&nbsp; $A$&nbsp; was the approximation&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; of the residual redundancy for the binary source&nbsp; $\rm BQ1$&nbsp; created?
 
|type="{}"}
 
|type="{}"}
 
$A \ = \ $  { 1.06 3% }
 
$A \ = \ $  { 1.06 3% }
  
  
{Wie groß muss&nbsp; $N = N_2$&nbsp; bei&nbsp;  $\rm BQ1$&nbsp; mindestens sein, damit die Restredundanz die Bedingung&nbsp; $r(N) &asymp; r\hspace{0.05cm}'(N) \le 5\%$&nbsp;   erfüllt?  
+
{What is the minimum size of&nbsp; $N = N_2$&nbsp; for&nbsp;  $\rm BQ1$&nbsp; so that the residual redundancy satisfies the condition&nbsp; $r(N) &asymp; r\hspace{0.05cm}'(N) \le 5\%$&nbsp;?
 
|type="{}"}
 
|type="{}"}
 
$N_{2} \ =  \ $ { 1.58 3% } $\ \cdot 10^{21}$
 
$N_{2} \ =  \ $ { 1.58 3% } $\ \cdot 10^{21}$
  
  
{Wie groß muss&nbsp; $N = N_3$&nbsp;  bei&nbsp;  $\rm BQ1$&nbsp; mindestens sein, damit der Komprimierungsfaktor&nbsp; $K(N)= L(N)/N$&nbsp; nicht größer ist als&nbsp; $0.6$?
+
{What is the minimum size of&nbsp; $N = N_3$&nbsp;  at&nbsp;  $\rm BQ1$&nbsp; so that the compression factor&nbsp; $K(N)= L(N)/N$&nbsp; is not greater than&nbsp; $0.6$?
 
|type="{}"}
 
|type="{}"}
 
$N_{3} \ =  \ $ { 2.29 3% } $\ \cdot 10^{6}$
 
$N_{3} \ =  \ $ { 2.29 3% } $\ \cdot 10^{6}$
  
  
{Bestimmen Sie nun die Redundanznäherung&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; für die redundanzfreie Binärquelle $\rm BQ2$, insbesondere:
+
{Now determine the redundancy approximation&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; for the redundancy-free binary source $\rm BQ2$, in particular:
 
|type="{}"}
 
|type="{}"}
 
$r'(N = 50000)\ =  \ $ { 0.162 3% }
 
$r'(N = 50000)\ =  \ $ { 0.162 3% }
Line 88: Line 89:
  
  
{Welche Werte liefert die Redundanznäherung&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; für die nicht näher spezifizierte Binärquelle&nbsp; $\rm BQ3$? Insbesondere:
+
{What values does the redundancy approximation&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; yield for the unspecified binary source&nbsp; $\rm BQ3$? In particular:
 
|type="{}"}
 
|type="{}"}
 
$r'(N = 50000)\ =  \ $ { 0.289 3% }
 
$r'(N = 50000)\ =  \ $ { 0.289 3% }
Line 95: Line 96:
  
  
{Welche Quellenentropie&nbsp; $H$&nbsp; könnte&nbsp; $\rm BQ3$&nbsp; nach diesem Ergebnis besitzen?&nbsp; <i>Hinweis:</i> Es ist genau eine Antwort richtig.
+
{According to this result, what source entropy&nbsp; $H$&nbsp; could&nbsp; $\rm BQ3$&nbsp; ?&nbsp; <u>Hint:</u>&nbsp; Exactly one answer is correct.
 
|type="()"}
 
|type="()"}
- $H = 1.00 \ \rm bit/Quellensymbol$,
+
- $H = 1.00 \ \rm bit/source\hspace{0.15cm}symbol$,
- $H = 0.75 \ \rm bit/Quellensymbol$,
+
- $H = 0.75 \ \rm bit/source\hspace{0.15cm} symbol$,
- $H = 0.50 \ \rm bit/Quellensymbol$,
+
- $H = 0.50 \ \rm bit/source\hspace{0.15cm} symbol$,
+ $H = 0.25 \ \rm bit/Quellensymbol$.
+
+ $H = 0.25 \ \rm bit/source\hspace{0.15cm} symbol$.
  
  
Line 106: Line 107:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
'''(1)'''&nbsp; Die Näherung&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; stimmt per Definition für die Folgenlänge&nbsp;  $N = 10000$&nbsp; mit der per Simulation ermittelten Restredundanz&nbsp; $r(N) = 0.265$&nbsp; exakt überein.  
+
'''(1)'''&nbsp; The approximation&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; agrees exactly by definition for the sequence length&nbsp;  $N = 10000$&nbsp; with the residual redundancy&nbsp; $r(N) = 0.265$&nbsp; determined by simulation.
*Damit ist
+
*Thus
 
:$$A = 4 \cdot  r(N = 10000) =4 \cdot {0.265} \hspace{0.15cm}\underline{= 1.06}
 
:$$A = 4 \cdot  r(N = 10000) =4 \cdot {0.265} \hspace{0.15cm}\underline{= 1.06}
 
\hspace{0.05cm}. $$
 
\hspace{0.05cm}. $$
Line 115: Line 116:
  
  
'''(2)'''&nbsp; Aus der Beziehung&nbsp; ${A}/{\rm lg}\hspace{0.1cm}(N) &#8804; 0.05$ &nbsp;&nbsp;&nbsp;&#8658;&nbsp;&nbsp;&nbsp; ${A}/{\rm lg}\hspace{0.1cm}(N) = 0.05$&nbsp; folgt:
+
'''(2)'''&nbsp; From the relationship&nbsp; ${A}/{\rm lg}\hspace{0.1cm}(N) &#8804; 0.05$ &nbsp;&nbsp;&nbsp;&#8658;&nbsp;&nbsp;&nbsp; ${A}/{\rm lg}\hspace{0.1cm}(N) = 0.05$&nbsp; it follows:
 
:$${{\rm lg}\hspace{0.1cm}N_{\rm 2}} = \frac{A}{0.05} = 21.2 \hspace{0.3cm}\Rightarrow\hspace{0.3cm}
 
:$${{\rm lg}\hspace{0.1cm}N_{\rm 2}} = \frac{A}{0.05} = 21.2 \hspace{0.3cm}\Rightarrow\hspace{0.3cm}
 
N_{\rm 2} = 10^{21.2} \hspace{0.15cm}\underline{= 1.58 \cdot 10^{21}}
 
N_{\rm 2} = 10^{21.2} \hspace{0.15cm}\underline{= 1.58 \cdot 10^{21}}
Line 122: Line 123:
  
  
'''(3)'''&nbsp; Allgemein gilt &nbsp;$r(N) = 1 - {H}/{K(N)} \hspace{0.05cm}.$&nbsp;  
+
'''(3)'''&nbsp; In general, &nbsp;$r(N) = 1 - {H}/{K(N)} \hspace{0.05cm}.$&nbsp;  
*$\rm BQ1$&nbsp; hat die Entropie&nbsp; $H = 0.5$ bit/Symbol.&nbsp;  
+
*$\rm BQ1$&nbsp; has entropy&nbsp; $H = 0.5$ bit/source symbol.&nbsp;  
*Daraus folgt wegen &nbsp;$r(N) &asymp; r\hspace{0.05cm}'(N)$&nbsp; für&nbsp; $K(N_3) = 0.6$:
+
*It follows that because &nbsp;$r(N) &asymp; r\hspace{0.05cm}'(N)$&nbsp; for&nbsp; $K(N_3) = 0.6$:
:$$r(N_{\rm c}) = 1 - \frac{0.5}{0.6} = 0.167 \hspace{0.1cm}\Rightarrow\hspace{0.1cm}  
+
:$$r(N_{\rm 3}) = 1 - \frac{0.5}{0.6} = 0.167 \hspace{0.1cm}\Rightarrow\hspace{0.1cm}  
 
{\rm lg}\hspace{0.1cm}N_{\rm 3} = \frac{A}{0.167} = 6.36
 
{\rm lg}\hspace{0.1cm}N_{\rm 3} = \frac{A}{0.167} = 6.36
 
\hspace{0.1cm}\Rightarrow\hspace{0.1cm}
 
\hspace{0.1cm}\Rightarrow\hspace{0.1cm}
Line 133: Line 134:
  
  
[[File:P_ID2447__Inf_A_2_5d.png|right|frame|Ergebnisse für&nbsp;  $\rm BQ2$]]
+
[[File:EN_Inf_A_2_5d_v2.png|right|frame|Results for&nbsp;  $\rm BQ2$]]
'''(4)'''&nbsp; Für&nbsp; $N = 10000$&nbsp; gilt &nbsp;$r(N) &asymp; r\hspace{0.05cm}'(N) = 0.19$:
+
'''(4)'''&nbsp; For&nbsp; $N = 10000$ &nbsp; &rArr; &nbsp; $r(N) &asymp; r\hspace{0.05cm}'(N) = 0.19$:
 
:$$\frac{A}{{\rm lg}\hspace{0.1cm}10000} = 0.19 \hspace{0.3cm}\Rightarrow\hspace{0.3cm}
 
:$$\frac{A}{{\rm lg}\hspace{0.1cm}10000} = 0.19 \hspace{0.3cm}\Rightarrow\hspace{0.3cm}
 
A = 0.19 \cdot 4 = 0.76 \hspace{0.05cm}. $$
 
A = 0.19 \cdot 4 = 0.76 \hspace{0.05cm}. $$
*Die Ergebnisse sind in nebenstehender Tabelle zusammengefasst.  
+
*The results are summarised in the table opposite.
*Man erkennt die sehr gute Übereinstimmung zwischen&nbsp; &nbsp;$r(N)$&nbsp; und &nbsp;$r\hspace{0.05cm}'(N)$.  
+
*One can see the very good agreement between&nbsp; &nbsp;$r(N)$&nbsp; and &nbsp;$r\hspace{0.05cm}'(N)$.  
*Die gesuchten Zahlenwerte sind in der Tabelle rot markiert:
+
*The numerical values sought are marked in red in the table:
 
$$r'(N = 50000)\hspace{0.15cm}\underline{ = 0.162},\hspace{0.3cm}r'(N = 10^{6})\hspace{0.15cm}\underline{ = 0.127},\hspace{0.3cm}
 
$$r'(N = 50000)\hspace{0.15cm}\underline{ = 0.162},\hspace{0.3cm}r'(N = 10^{6})\hspace{0.15cm}\underline{ = 0.127},\hspace{0.3cm}
 
r'(N = 10^{12})\hspace{0.15cm}\underline{ = 0.063}.$$
 
r'(N = 10^{12})\hspace{0.15cm}\underline{ = 0.063}.$$
  
*Für den Komprimierungsfaktor gilt (der Apostroph weist darauf hin, dass von der Näherung &nbsp;$r\hspace{0.05cm}'(N)$&nbsp; ausgegangen wurde):
+
*For the compression factor &nbsp; &ndash; &nbsp; the apostrophe indicates that the approximation &nbsp;$r\hspace{0.05cm}'(N)$&nbsp; was assumed:
 
:$$K\hspace{0.05cm}'(N) = \frac{1}{1 - r\hspace{0.05cm}'(N)}\hspace{0.05cm}.$$
 
:$$K\hspace{0.05cm}'(N) = \frac{1}{1 - r\hspace{0.05cm}'(N)}\hspace{0.05cm}.$$
*Damit gilt für die Länge des LZW&ndash;Ausgabestrings:
+
*Thus, for the length of the LZW output string:
 
:$$L\hspace{0.05cm}'(N) = K\hspace{0.05cm}'(N) \cdot N  = \frac{N}{1 - r\hspace{0.05cm}'(N)}\hspace{0.05cm}.$$
 
:$$L\hspace{0.05cm}'(N) = K\hspace{0.05cm}'(N) \cdot N  = \frac{N}{1 - r\hspace{0.05cm}'(N)}\hspace{0.05cm}.$$
  
  
  
[[File:P_ID2448__Inf_A_2_5e.png|right|frame|Ergebnisse für&nbsp; $\rm BQ3$]]
+
[[File:EN_Inf_A_2_5e_v2.png|right|frame|Results for&nbsp; $\rm BQ3$]]
'''(5)'''&nbsp; Nach ähnlicher Vorgehensweise wie in der Teilaufgabe&nbsp; '''(4)'''&nbsp; erhält man für die Binärquelle&nbsp; $\rm BQ3$&nbsp; den Anpassungsparameter&nbsp; $A = 1.36$&nbsp; und daraus die Ergebnisse gemäß der blau hinterlegten Tabelle.
+
'''(5)'''&nbsp; Following a similar procedure as in subtask&nbsp; '''(4)'''&nbsp; we obtain the fitting parameter&nbsp; $A = 1.36$&nbsp; for the binary source&nbsp; $\rm BQ3$&nbsp; and from this the results according to the table with a blue background.
  
<u>Hinweis:</u> &nbsp; Die letzte Spalte dieser Tabelle ist nur bei Kenntnis der Teilaufgabe&nbsp; '''(6)'''&nbsp; verständlich. Dort wird gezeigt, dass die  Quelle&nbsp; $\rm BQ3$&nbsp; die Entropie&nbsp; $H = 0.25$ bit/Quellensymbol besitzt.
+
<u>Hint:</u> &nbsp; The last column of this table is only understandable with knowledge of subtask&nbsp; '''(6)'''.&nbsp; There it is shown that the source&nbsp; $\rm BQ3$&nbsp; has the entropy&nbsp; $H = 0.25$&nbsp; bit/source symbol.
  
*In diesem Fall gilt für den Komprimierungsfaktor:
+
*In this case, the following applies to the compression factor:
 
:$$K\hspace{0.05cm}'(N) = \frac{H}{1 - r\hspace{0.05cm}'(N)} = \frac{0.25}{1 - r'(N)} \hspace{0.05cm}.$$
 
:$$K\hspace{0.05cm}'(N) = \frac{H}{1 - r\hspace{0.05cm}'(N)} = \frac{0.25}{1 - r'(N)} \hspace{0.05cm}.$$
*Damit erhält man für die gesuchten Werte der Restredundanz:
+
*Thus, for the values of residual redundancy we are looking for, we obtain:
 
:$$r\hspace{0.05cm}'(N = 50000)\hspace{0.15cm}\underline{ = 0.289},\hspace{0.3cm}r\hspace{0.05cm}'(N = 10^{6})\hspace{0.15cm}\underline{ = 0.227},\hspace{0.3cm}
 
:$$r\hspace{0.05cm}'(N = 50000)\hspace{0.15cm}\underline{ = 0.289},\hspace{0.3cm}r\hspace{0.05cm}'(N = 10^{6})\hspace{0.15cm}\underline{ = 0.227},\hspace{0.3cm}
 
r\hspace{0.05cm}'(N = 10^{12})\hspace{0.15cm}\underline{ = 0.113}.$$
 
r\hspace{0.05cm}'(N = 10^{12})\hspace{0.15cm}\underline{ = 0.113}.$$
*Für&nbsp; $N = 10^{12}$&nbsp; weicht also der Komprimierungsfaktor&nbsp; $(0.282)$&nbsp; noch deutlich von der Entropie&nbsp; $(0.25)$&nbsp; ab, die erst  für&nbsp; $N \to \infty$&nbsp; erreicht werden kann (Quellencodierungstheorem).
+
*Thus, for&nbsp; $N = 10^{12}$&nbsp; , the compression factor&nbsp; $(0.282)$&nbsp; still deviates significantly from the entropy&nbsp; $(0.25)$&nbsp; which can only be achieved for&nbsp; $N \to \infty$&nbsp; ("Source Coding Theorem").
 
 
  
  
  
'''(6)'''&nbsp; Die einzelnen Näherungen&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; unterscheiden sich nur durch den Parameter&nbsp; $A$.&nbsp; Dabei haben wir festgestellt:
 
# Quelle&nbsp; $\rm BQ1$&nbsp; mit&nbsp; $H = 0.50$ &nbsp; &rArr; &nbsp; $A = 1.06$ &nbsp; &rArr; &nbsp; entsprechend dem Angabenblatt,
 
# Quelle&nbsp; $\rm BQ2$&nbsp; mit&nbsp; $H = 1.00$ &nbsp; &rArr; &nbsp; $A = 0.76$ &nbsp; &rArr; &nbsp; siehe Teilaufgabe&nbsp; '''(4)''',
 
# Quelle&nbsp; $\rm BQ3$&nbsp; $(H$ unbekannt$)$: $A = 4 &middot; 0.34 =1.36$ &nbsp; &rArr; &nbsp; entsprechend der letzten Spalte in der Tabelle.
 
  
 +
'''(6)'''&nbsp; The individual approximations&nbsp; $r\hspace{0.05cm}'(N)$&nbsp; differ only by the parameter&nbsp; $A$.&nbsp; Here we found:
 +
# Source&nbsp; $\rm BQ1$&nbsp; with&nbsp; $H = 0.50$ &nbsp; &rArr; &nbsp; $A = 1.06$ &nbsp; &rArr; &nbsp; according to the specification sheet,
 +
# Source&nbsp; $\rm BQ2$&nbsp; with&nbsp; $H = 1.00$ &nbsp; &rArr; &nbsp; $A = 0.76$ &nbsp; &rArr; &nbsp; see subtask&nbsp; '''(4)''',
 +
# Source&nbsp; $\rm BQ3$&nbsp; $(H$ unknown$)$: $A = 4 &middot; 0.34 =1.36$ &nbsp; &rArr; &nbsp; corresponding to the table on the right.
  
*Je kleiner die Entropie&nbsp; $H$&nbsp; ist, um so größer ist offensichtlich der Anpassungsfaktor&nbsp; $A$&nbsp; (und umgekehrt).  
+
*Obviously, the smaller the entropy&nbsp; $H$&nbsp; the larger the adjustment factor&nbsp; $A$&nbsp; (and vice versa).  
*Da genau eine Lösung möglich ist, muss&nbsp; $H = 0.25$&nbsp; bit/Quellensymbol richtig sein &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <u>Antwort 4</u>.
+
*Since exactly one solution is possible, &nbsp; $H = 0.25$&nbsp; bit/source&nbsp; symbol must be correct &nbsp;&nbsp;&#8658;&nbsp;&nbsp; <u>answer 4</u>.
  
*Tatsächlich wurden bei der Simulation für die Quelle&nbsp; $\rm BQ3$&nbsp; die Wahrscheinlichkeiten &nbsp;$p_{\rm A} = 0.96$&nbsp; und &nbsp;$p_{\rm B} = 0.04$&nbsp; &nbsp; &#8658; &nbsp; $H &asymp; 0.25$ verwendet.
+
*In fact, the probabilities&nbsp;$p_{\rm A} = 0.96$&nbsp; and &nbsp;$p_{\rm B} = 0.04$&nbsp; &nbsp; &#8658; &nbsp; $H &asymp; 0.25$ were used in the simulation for the source&nbsp; $\rm BQ3$.
 
{{ML-Fuß}}
 
{{ML-Fuß}}
  
  
  
[[Category:Information Theory: Exercises|^2.2 Verfahren von Lempel, Ziv, Welch^]]
+
[[Category:Information Theory: Exercises|^2.2 Lempel-Ziv-Welch Compression^]]

Latest revision as of 14:12, 10 August 2021

Residual redundancy  $r(N)$  and approximation   $r\hspace{0.05cm}'(N)$  of three sources

We assume here a binary input sequence of length  $N$  and consider three different binary sources:

  • $\rm BQ1$:   Symbol probabilities  $p_{\rm A} = 0.89$  and  $p_{\rm B} = 0.11$, i.e. different
      ⇒   entropy   $H = 0.5\text{ bit/source symbol}$   ⇒   the source is redundant.
  • $\rm BQ2$:   $p_{\rm A} = p_{\rm B} = 0.5$  (equally probable)
      ⇒   entropy   $H = 1\text{ bit/source symbol}$   ⇒   the source is redundancy-free.
  • $\rm BQ3$:   There is no concrete information on the statistics here. 
    In subtask  (6)  you are to estimate the entropy  $H$  of this source.


For these three sources, the respective  "residual redundancy"  $r(N)$  was determined by simulation, which remains in the binary sequence after  Lempel–Ziv–Welch coding

The results are shown in the first column of the above table for the sources

  • $\rm BQ1$  (yellow background),
  • $\rm BQ2$  (green background) and
  • $\rm BQ3$  (blue background)


whereby we have restricted ourselves to sequence lengths  $N ≤ 50000$  in the simulation.

The  "relative redundancy of the output sequence" – simplified called  "residual redundancy" – can be calculated from

  • the length  $N$  of the input sequence,
  • the length  $L(N)$  of the output sequence and
  • the entropy  $H$


in the following way:

$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 - \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$

This takes into account that with perfect source coding the length of the output sequence could be lowered to the value  $L_{\rm min} = N · H$ .

  • With non-perfect source coding,  $L(n) - N · H$  gives the remaining redundancy  (with the pseudo–unit  "bit").
  • After dividing by  $L(n)$,  one obtains the relative redundancy  $r(n)$  with the value range between zero and one;  $r(n)$  should be as small as possible.


A second parameter for measuring the efficiency of LZW coding is the  "compression factor"  $K(N)$  as the quotient of the lengths of the output and input sequences, which should also be very small:

$$K(N) = {L(N) }/{N} \hspace{0.05cm},$$

In the  theory section  it was shown that the residual redundancy  $r(n)$  is well approximated by the function

$$r\hspace{0.05cm}'(N) =\frac {A}{{\rm lg}\hspace{0.1cm}(N)} \hspace{0.5cm}{\rm mit}\hspace{0.5cm} A = 4 \cdot {r(N = 10000)} \hspace{0.05cm}.$$


  • This approximation  $r\hspace{0.05cm}'(N)$  is given for  $\rm BQ1$  in the second column of the table above.
  • In subtasks  (4)  and  (5)  you are to make the approximation for sources  $\rm BQ2$  and  $\rm BQ3$ .





Hints:

Remaining redundancy as a measure for the efficiency of coding methods,
Efficiency of Lempel-Ziv coding and
Quantitative statements on asymptotic optimality.
  • The descriptive variables  $K(N)$  and  $r(N)$  are deterministically related.


Questions

1

With which parameter  $A$  was the approximation  $r\hspace{0.05cm}'(N)$  of the residual redundancy for the binary source  $\rm BQ1$  created?

$A \ = \ $

2

What is the minimum size of  $N = N_2$  for  $\rm BQ1$  so that the residual redundancy satisfies the condition  $r(N) ≈ r\hspace{0.05cm}'(N) \le 5\%$ ?

$N_{2} \ = \ $

$\ \cdot 10^{21}$

3

What is the minimum size of  $N = N_3$  at  $\rm BQ1$  so that the compression factor  $K(N)= L(N)/N$  is not greater than  $0.6$?

$N_{3} \ = \ $

$\ \cdot 10^{6}$

4

Now determine the redundancy approximation  $r\hspace{0.05cm}'(N)$  for the redundancy-free binary source $\rm BQ2$, in particular:

$r'(N = 50000)\ = \ $

$r'(N = 10^6)\ = \ $

$r'(N = 10^{12})\ = \ $

5

What values does the redundancy approximation  $r\hspace{0.05cm}'(N)$  yield for the unspecified binary source  $\rm BQ3$? In particular:

$r'(N = 50000)\ = \ $

$r'(N = 10^6)\ = \ $

$r'(N = 10^{12})\ = \ $

6

According to this result, what source entropy  $H$  could  $\rm BQ3$  ?  Hint:  Exactly one answer is correct.

$H = 1.00 \ \rm bit/source\hspace{0.15cm}symbol$,
$H = 0.75 \ \rm bit/source\hspace{0.15cm} symbol$,
$H = 0.50 \ \rm bit/source\hspace{0.15cm} symbol$,
$H = 0.25 \ \rm bit/source\hspace{0.15cm} symbol$.


Solution

(1)  The approximation  $r\hspace{0.05cm}'(N)$  agrees exactly by definition for the sequence length  $N = 10000$  with the residual redundancy  $r(N) = 0.265$  determined by simulation.

  • Thus
$$A = 4 \cdot r(N = 10000) =4 \cdot {0.265} \hspace{0.15cm}\underline{= 1.06} \hspace{0.05cm}. $$


(2)  From the relationship  ${A}/{\rm lg}\hspace{0.1cm}(N) ≤ 0.05$    ⇒    ${A}/{\rm lg}\hspace{0.1cm}(N) = 0.05$  it follows:

$${{\rm lg}\hspace{0.1cm}N_{\rm 2}} = \frac{A}{0.05} = 21.2 \hspace{0.3cm}\Rightarrow\hspace{0.3cm} N_{\rm 2} = 10^{21.2} \hspace{0.15cm}\underline{= 1.58 \cdot 10^{21}} \hspace{0.05cm}.$$


(3)  In general,  $r(N) = 1 - {H}/{K(N)} \hspace{0.05cm}.$ 

  • $\rm BQ1$  has entropy  $H = 0.5$ bit/source symbol. 
  • It follows that because  $r(N) ≈ r\hspace{0.05cm}'(N)$  for  $K(N_3) = 0.6$:
$$r(N_{\rm 3}) = 1 - \frac{0.5}{0.6} = 0.167 \hspace{0.1cm}\Rightarrow\hspace{0.1cm} {\rm lg}\hspace{0.1cm}N_{\rm 3} = \frac{A}{0.167} = 6.36 \hspace{0.1cm}\Rightarrow\hspace{0.1cm} N_{\rm 3} = 10^{6.36} \hspace{0.15cm}\underline{= 2.29 \cdot 10^{6}} \hspace{0.05cm}.$$


Results for  $\rm BQ2$

(4)  For  $N = 10000$   ⇒   $r(N) ≈ r\hspace{0.05cm}'(N) = 0.19$:

$$\frac{A}{{\rm lg}\hspace{0.1cm}10000} = 0.19 \hspace{0.3cm}\Rightarrow\hspace{0.3cm} A = 0.19 \cdot 4 = 0.76 \hspace{0.05cm}. $$
  • The results are summarised in the table opposite.
  • One can see the very good agreement between   $r(N)$  and  $r\hspace{0.05cm}'(N)$.
  • The numerical values sought are marked in red in the table:

$$r'(N = 50000)\hspace{0.15cm}\underline{ = 0.162},\hspace{0.3cm}r'(N = 10^{6})\hspace{0.15cm}\underline{ = 0.127},\hspace{0.3cm} r'(N = 10^{12})\hspace{0.15cm}\underline{ = 0.063}.$$

  • For the compression factor   –   the apostrophe indicates that the approximation  $r\hspace{0.05cm}'(N)$  was assumed:
$$K\hspace{0.05cm}'(N) = \frac{1}{1 - r\hspace{0.05cm}'(N)}\hspace{0.05cm}.$$
  • Thus, for the length of the LZW output string:
$$L\hspace{0.05cm}'(N) = K\hspace{0.05cm}'(N) \cdot N = \frac{N}{1 - r\hspace{0.05cm}'(N)}\hspace{0.05cm}.$$


Results for  $\rm BQ3$

(5)  Following a similar procedure as in subtask  (4)  we obtain the fitting parameter  $A = 1.36$  for the binary source  $\rm BQ3$  and from this the results according to the table with a blue background.

Hint:   The last column of this table is only understandable with knowledge of subtask  (6).  There it is shown that the source  $\rm BQ3$  has the entropy  $H = 0.25$  bit/source symbol.

  • In this case, the following applies to the compression factor:
$$K\hspace{0.05cm}'(N) = \frac{H}{1 - r\hspace{0.05cm}'(N)} = \frac{0.25}{1 - r'(N)} \hspace{0.05cm}.$$
  • Thus, for the values of residual redundancy we are looking for, we obtain:
$$r\hspace{0.05cm}'(N = 50000)\hspace{0.15cm}\underline{ = 0.289},\hspace{0.3cm}r\hspace{0.05cm}'(N = 10^{6})\hspace{0.15cm}\underline{ = 0.227},\hspace{0.3cm} r\hspace{0.05cm}'(N = 10^{12})\hspace{0.15cm}\underline{ = 0.113}.$$
  • Thus, for  $N = 10^{12}$  , the compression factor  $(0.282)$  still deviates significantly from the entropy  $(0.25)$  which can only be achieved for  $N \to \infty$  ("Source Coding Theorem").



(6)  The individual approximations  $r\hspace{0.05cm}'(N)$  differ only by the parameter  $A$.  Here we found:

  1. Source  $\rm BQ1$  with  $H = 0.50$   ⇒   $A = 1.06$   ⇒   according to the specification sheet,
  2. Source  $\rm BQ2$  with  $H = 1.00$   ⇒   $A = 0.76$   ⇒   see subtask  (4),
  3. Source  $\rm BQ3$  $(H$ unknown$)$: $A = 4 · 0.34 =1.36$   ⇒   corresponding to the table on the right.
  • Obviously, the smaller the entropy  $H$  the larger the adjustment factor  $A$  (and vice versa).
  • Since exactly one solution is possible,   $H = 0.25$  bit/source  symbol must be correct   ⇒   answer 4.
  • In fact, the probabilities $p_{\rm A} = 0.96$  and  $p_{\rm B} = 0.04$    ⇒   $H ≈ 0.25$ were used in the simulation for the source  $\rm BQ3$.