Difference between revisions of "Aufgaben:Exercise 2.5Z: Compression Factor vs. Residual Redundancy"

Revision as of 13:38, 1 August 2021

Data length $L(N)$ after LZW coding for $\rm BQ1$ and $\rm BQ2$

As in exercise 2.5 we consider data compression using the Lempel–Ziv–Welch–algorithm. Here holds:

Let the input sequence have length $N$.
The length of the LZW coder output is $L$.

The graph shows for two different binary sources $\rm BQ1$ and $\rm BQ2$ the relationship between the sequence lengths $N$ and $L$, represented by the function $L(N)$.

The two message sources have the same statistical properties as in exercise 2.5:

$\rm BQ1$ is redundant due to unequal symbol probabilities $(p_{\rm A} = 0.89$ and $p_{\rm B} = 0.11)$ . There are no ties between the individual symbols. The entropy is $H = 0.5$ bit/source symbol.
$\rm BQ2$ is redundancy-free and thus has entropy $H = 1$ bit/source symbol.

Furthermore, you need two definitions for the solution of this task:

Let the compression factor be by definition

$$K(N) = \frac{L(N)}{N}\hspace{0.05cm}.$$

The relative redundancy of the LZW code sequence (called residual redundancy in the following) is

$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 - \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$

Hints:

The task belongs to the chapter Compression according to Lempel, Ziv and Welch.
In particular, reference is made to the pages

Residual redundancy as a measure of the efficiency of coding methods,

Efficiency of Lempel-Ziv coding and

Quantitative statements on asymptotic optimality.

Questions

$\rm BQ1$: $K(N = 10000) \ = \ $

$\rm BQ2$: $K(N = 10000) \ = \ $

$\rm BQ1$: $r(N = 10000) \ = \ $

$\ \%$

$\rm BQ2$: $r(N = 10000) \ = \ $

$\ \%$

	For both sources $K(N = 50000)$ is smaller than $K(N = 10000)$.
	For both sources $r(N = 50000)$ is smaller than $r(N = 10000)$.
	Only $\rm BQ1$ has more favourable values with $N = 50000$ günstigere Werte.

Musterlösung

Solution

(1) Der Komprimierungsfaktor ist definiert als der Quotient der Längen von LZW–Ausgangsfolge $(L)$ und Eingangsfolge $(N = 10000)$:

$${\rm BQ1:}\hspace{0.3cm} K \hspace{0.2cm} = \hspace{0.2cm} \frac{6800}{10000}\hspace{0.15cm}\underline{= 0.680}\hspace{0.05cm},$$

$$ {\rm BQ2:}\hspace{0.3cm} K \hspace{0.2cm} = \hspace{0.2cm} \frac{12330}{10000}\hspace{0.15cm}\underline{= 1.233}\hspace{0.05cm}. $$

Die LZW–Codierung macht natürlich nur bei der redundanten Binärquelle $\rm BQ1$ Sinn. Hier kann die Datenmenge um $32\%$ gesenkt werden.
Bei der redundanzfreien Binärquelle $\rm BQ2$ führt dagegen die LZW–Codierung zu einer um $23.3\%$ größeren Datenmenge.

(2) Aus der angegebenen Gleichung erhält man mit $N = 10000$:

$${\rm BQ1:}\hspace{0.3cm} H = 0.5\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} = \hspace{0.2cm}1 - \frac{0.5 \cdot N}{L } = 1 - \frac{5000}{6800 } \hspace{0.15cm}\underline{\approx 26.5\,\%}\hspace{0.05cm},$$

$$ {\rm BQ2:}\hspace{0.3cm} H = 1.0\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} = \hspace{0.2cm}1 - \frac{N}{L } = 1 - \frac{10000}{12330 } \hspace{0.15cm}\underline{\approx 19\,\%}\hspace{0.05cm}.$$

Die Restredundanz gibt die (relative) Redundanz der LZW–Ausgangsfolge an.
Obwohl die Quelle $\rm BQ1$ für die LZW–Codierung besser geeignet ist als die redundanzfreie Quelle $\rm BQ2$, ergibt sich bei $\rm BQ1$ wegen der Redundanz in der Eingangsfolge eine größere Restredundanz.
Eine kleinere Restredundanz $r(N)$ bei gegebenem $N$ sagt also nichts darüber aus, ob die LZW–Codierung für die vorliegende Quelle sinnvoll ist.
Hierzu muss der Komprimierungsfaktor $K(N)$ betrachtet werden. Allgemein gilt folgender Zusammenhang zwischen $r(N)$ und $K(N)$:

$$r(N) = 1 - \frac{H}{K(N)}\hspace{0.05cm},\hspace{0.5cm} K(N) = H \cdot (1- r(N)) \hspace{0.05cm}.$$

(3) Aus der Tabelle auf der Angabenseite kann man ablesen (bzw. daraus ableiten)

für die redundante Binärquelle $\rm BQ1$:

$$L(N = 50000) = 32100\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 0.642\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 22.2\,\% \hspace{0.05cm}},$$

für die redundanzfreie Binärquelle $\rm BQ2$:

$$L(N = 50000) = 59595\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 1.192\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 16.1\,\% \hspace{0.05cm}}.$$

Richtig sind somit die Aussagen 1 und 2:

Für beide Quellen ist der Komprimierungsfaktor $K(N)$ für $N = 50000$ kleiner als für $N = 10000$.
Gleiches gilt für die Restredundanz: $r(N = 50000) < r(N = 10000)$.
Sowohl hinsichtlich $K(N)$ als auch hinsichtlich $r(N)$ ergeben sich also bei größerem $N$ "günstigere" Werte, auch dann, wenn eigentlich wie bei der redundanzfreien Binärquelle $\rm BQ2$ die Anwendung von Lempel–Ziv zu einer Verschlechterung führt.

@@ Line 3: / Line 3: @@
 }}
-[[File:P_ID2449__Inf_Z_2_5_neu.png|right|frame|Datenlänge $L(N)$  nach LZW&ndash;Codierung für $\rm BQ1$ und $\rm BQ2$]]
+[[File:P_ID2449__Inf_Z_2_5_neu.png|right|frame|Data length $L(N)$ after LZW coding for $\rm BQ1$ and $\rm BQ2$]]
-Wir betrachten wie in der&nbsp; [[Aufgaben:2.5_Restredundanz_bei_LZW-Codierung|Aufgabe 2.5]]&nbsp; die Datenkomprimierung mit dem 1983 veröffentlichten&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Der_Lempel.E2.80.93Ziv.E2.80.93Welch.E2.80.93Algorithmus|Lempel&ndash;Ziv&ndash;Welch&ndash;Algorithmus]]. Dabei gilt:
+As in&nbsp; [[Aufgaben:2.5_Restredundanz_bei_LZW-Codierung|exercise 2.5]]&nbsp; we consider data compression using the&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#The_Lempel-Ziv-Welch_algorithm|Lempel&ndash;Ziv&ndash;Welch&ndash;algorithm]]. Here holds:
-* Die Eingangsfolge habe die Länge&nbsp; $N$.
+* Let the input sequence have length&nbsp; $N$.
-* Die Länge der LZW&ndash;Coderausgabe ist&nbsp; $L$.
+* The length of the LZW coder output is&nbsp; $L$.
-Die Grafik zeigt für zwei verschiedene Binärquellen&nbsp; $\rm BQ1$&nbsp; und&nbsp; $\rm BQ2$&nbsp; den Zusammenhang zwischen den Folgenlängen&nbsp; $N$&nbsp; und&nbsp; $L$, dargestellt durch den Funktionsverlauf&nbsp; $L(N)$.&nbsp;
+The graph shows for two different binary sources&nbsp; $\rm BQ1$&nbsp; and&nbsp; $\rm BQ2$&nbsp; the relationship between the sequence lengths&nbsp; $N$&nbsp; and&nbsp; $L$, represented by the function&nbsp; $L(N)$.&nbsp;
-Die beiden Nachrichtenquellen besitzen die gleichen statistischen Eigenschaften wie in der&nbsp; [[Aufgaben:Aufgabe_2.5:_Restredundanz_bei_LZW-Codierung|Aufgabe 2.5]]:
+The two message sources have the same statistical properties as in&nbsp; [[Aufgaben:Aufgabe_2.5:_Restredundanz_bei_LZW-Codierung|exercise 2.5]]:
-* $\rm BQ1$&nbsp; ist aufgrund von ungleichen Symbolwahrscheinlichkeiten&nbsp; $(p_{\rm A} = 0.89$&nbsp; und&nbsp; $p_{\rm B} = 0.11)$&nbsp; redundant.&nbsp; Es bestehen keine Bindungen zwischen den einzelnen Symbolen.&nbsp; Die Entropie ist&nbsp; $H = 0.5$ bit/Quellensymbol.
+* $\rm BQ1$&nbsp; is redundant due to unequal symbol probabilities&nbsp; $(p_{\rm A} = 0.89$&nbsp; and&nbsp; $p_{\rm B} = 0.11)$&nbsp;.&nbsp; There are no ties between the individual symbols.&nbsp; The entropy is&nbsp; $H = 0.5$ bit/source symbol.
-* $\rm BQ2$&nbsp;  ist redundanzfrei und weist somit die Entropie&nbsp; $H = 1$ bit/Quellensymbol auf.
+* $\rm BQ2$&nbsp; is redundancy-free and thus has entropy&nbsp; $H = 1$ bit/source symbol.
-Weiter benötigen Sie für die Lösung dieser Aufagbe noch zwei Definitionen:
+Furthermore, you need two definitions for the solution of this task:
-* Der <i>Komprimierungsfaktor</i> sei definitionsgemäß
+*Let the <i>compression factor</i> be by definition
 :$$K(N) = \frac{L(N)}{N}\hspace{0.05cm}.$$
-* Die relative Redundanz der LZW&ndash;Coderfolge&nbsp; (im Folgenden&nbsp; <i>Restredundanz</i>&nbsp; genannt)&nbsp; ist
+* The relative redundancy of the LZW code sequence&nbsp; (called&nbsp; <i>residual redundancy</i>&nbsp; in the following)&nbsp; is
 :$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 -  \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$
@@ Line 29: / Line 29: @@
-''Hinweise:''
+''Hints:''
-*Die Aufgabe gehört zum  Kapitel&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch|Komprimierung nach Lempel, Ziv und Welch]].
+*The task belongs to the chapter&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch|Compression according to Lempel, Ziv and Welch]].
-*Insbesondere wird  Bezug genommen auf die Seiten
+*In particular, reference is made to the pages
-:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Restredundanz_als_Ma.C3.9F_f.C3.BCr_die_Effizienz_von_Codierverfahren|Restredrundanz als Maß für die Effizienz von Codierverfahren]],
+:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Remaining_redundancy_as_a_measure_for_the_efficiency_of_coding_methods|Residual redundancy as a measure of the efficiency of coding methods]],
-:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Effizienz_der_Lempel.E2.80.93Ziv.E2.80.93Codierung|Effizienz der Lempel-Ziv-Codierung]] sowie
+:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Effizienz_der_Lempel.E2.80.93Ziv.E2.80.93Codierung|Efficiency of Lempel-Ziv coding]] and
-:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Quantitative_Aussagen_zur_asymptotischen_Optimalit.C3.A4t|Quantitative Aussagen zur asymptotischen Optimalität]].
+:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Quantitative_Aussagen_zur_asymptotischen_Optimalit.C3.A4t|Quantitative statements on asymptotic optimality]].
-===Fragebogen===
+===Questions===
 <quiz display=simple>
-{Welche Komprimierungfaktoren&nbsp; $K(N)$&nbsp; ergeben sich mit&nbsp; $N = 10000$?
+{Which compression factors&nbsp; $K(N)$&nbsp; result with&nbsp; $N = 10000$?
 |type="{}"}
 $\rm BQ1$: &nbsp; &nbsp; $K(N = 10000) \ = \ $ { 0.68 3% }
@@ Line 47: / Line 47: @@
-{Wie groß ist die Restredundanz&nbsp;  $r(N)$&nbsp; (in Prozent)?&nbsp; Es gelte wieder&nbsp; $N = 10000$.
+{What is the residual redundancy&nbsp;  $r(N)$&nbsp; (in percent)?&nbsp; Let&nbsp; $N = 10000$ apply again.
 |type="{}"}
 $\rm BQ1$: &nbsp; &nbsp; $r(N = 10000) \ = \  $ { 26.5 3% } $\ \%$
@@ Line 53: / Line 53: @@
-{Welche Aussagen liefert der Vergleich von&nbsp; $N = 10000$&nbsp; und&nbsp; $N = 50000$?
+{What conclusions can be drawn from the comparison of&nbsp; $N = 10000$&nbsp; And&nbsp; $N = 50000$?
 |type="[]"}
-+ Bei beiden Quellen ist&nbsp; $K(N = 50000)$&nbsp; kleiner als&nbsp; $K(N = 10000)$.
++ For both sources&nbsp; $K(N = 50000)$&nbsp; is smaller than&nbsp; $K(N = 10000)$.
-+ Bei beiden Quellen ist&nbsp; $r(N = 50000)$&nbsp; kleiner als&nbsp; $r(N = 10000)$.
++ For both sources&nbsp; $r(N = 50000)$&nbsp; is smaller than&nbsp; $r(N = 10000)$.
-- Nur bei&nbsp; $\rm BQ1$&nbsp; ergeben sich mit&nbsp; $N = 50000$&nbsp; günstigere Werte.
+- Only&nbsp; $\rm BQ1$&nbsp; has more favourable values with&nbsp; $N = 50000$&nbsp; günstigere Werte.