Difference between revisions of "Aufgaben:Exercise 2.5Z: Compression Factor vs. Residual Redundancy"

Latest revision as of 13:13, 10 August 2021

Data length $L(N)$ after LZW coding for $\rm BQ1$ and $\rm BQ2$

As in Exercise 2.5 we consider data compression using the Lempel–Ziv–Welch algorithm. Here holds:

Let the input sequence have the length $N$.
The length of the LZW coder output is $L$.

The graph shows for two different binary sources $\rm BQ1$ and $\rm BQ2$ the relationship between the sequence lengths $N$ and $L$, represented by the function $L(N)$.

The two sources have the same statistical properties as in Exercise 2.5:

$\rm BQ1$ is redundant due to unequal symbol probabilities $(p_{\rm A} = 0.89$ and $p_{\rm B} = 0.11)$ . There are no dependencies between the individual symbols. The entropy is $H = 0.5$ bit/source symbol.
$\rm BQ2$ is redundancy-free and thus has the entropy $H = 1$ bit/source symbol.

Furthermore, you need two definitions for the solution of this task:

Let the "compression factor" be by definition

$$K(N) = \frac{L(N)}{N}\hspace{0.05cm}.$$

The relative redundancy of the LZW code sequence (called "residual redundancy" in the following) is

$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 - \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$

Hints:

The exercise belongs to the chapter Compression according to Lempel, Ziv and Welch.
In particular, reference is made to the pages

Remainding redundancy as a measure of the efficiency of coding methods,

Efficiency of Lempel-Ziv coding and

Quantitative statements on asymptotic optimality.

Questions

$\rm BQ1$: $K(N = 10000) \ = \ $

$\rm BQ2$: $K(N = 10000) \ = \ $

$\rm BQ1$: $r(N = 10000) \ = \ $

$\ \%$

$\rm BQ2$: $r(N = 10000) \ = \ $

$\ \%$

	For both sources $K(N = 50000)$ is smaller than $K(N = 10000)$.
	For both sources $r(N = 50000)$ is smaller than $r(N = 10000)$.
	Only $\rm BQ1$ has more favourable values with $N = 50000$ .

Solution

(1) The compression factor is defined as the quotient of the lengths of the LZW output sequence $(L)$ and input sequence $(N = 10000)$:

$${\rm BQ1:}\hspace{0.3cm} K \hspace{0.2cm} = \hspace{0.2cm} \frac{6800}{10000}\hspace{0.15cm}\underline{= 0.680}\hspace{0.05cm},$$

$$ {\rm BQ2:}\hspace{0.3cm} K \hspace{0.2cm} = \hspace{0.2cm} \frac{12330}{10000}\hspace{0.15cm}\underline{= 1.233}\hspace{0.05cm}. $$

Of course, the LZW coding only makes sense with the redundant binary source $\rm BQ1$ . Here the amount of data can be reduced by $1-0.68=32\%$ .
With the redundancy-free binary source $\rm BQ2$ , on the other hand, the LZW coding leads to a $1.233-1=23.3\%$ larger data quantity.

(2) From the given equation we obtain with $N = 10000$:

$${\rm BQ1:}\hspace{0.3cm} H = 0.5\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} = \hspace{0.2cm}1 - \frac{0.5 \cdot N}{L } = 1 - \frac{5000}{6800 } \hspace{0.15cm}\underline{\approx 26.5\,\%}\hspace{0.05cm},$$

$$ {\rm BQ2:}\hspace{0.3cm} H = 1.0\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} = \hspace{0.2cm}1 - \frac{N}{L } = 1 - \frac{10000}{12330 } \hspace{0.15cm}\underline{\approx 19\,\%}\hspace{0.05cm}.$$

The residual redundancy indicates the (relative) redundancy of the LZW output sequence.
Although the source $\rm BQ1$ is more suitable for LZW coding than the redundancy-free source $\rm BQ2$, $\rm BQ1$ results in a larger residual redundancy because of the redundancy in the input sequence.
A smaller residual redundancy $r(N)$ for a given $N$ therefore says nothing about whether LZW coding is useful for the source at hand.
For this, the compression factor $K(N)$ must be considered. In general, the following relationship between $r(N)$ and $K(N)$ applies:

$$r(N) = 1 - \frac{H}{K(N)}\hspace{0.05cm},\hspace{0.5cm} K(N) = H \cdot (1- r(N)) \hspace{0.05cm}.$$

(3) From the table on the information page one can read (or deduce)

for the redundant binary source $\rm BQ1$:

$$L(N = 50000) = 32100\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 0.642\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 22.2\,\% \hspace{0.05cm}},$$

for the redundancy-free binary source $\rm BQ2$:

$$L(N = 50000) = 59595\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 1.192\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 16.1\,\% \hspace{0.05cm}}.$$

Statements 1 and 2 are therefore correct:

For both sources the compression factor $K(N)$ is smaller for $N = 50000$ than for $N = 10000$.
The same applies to the residual redundancy: $r(N = 50000) < r(N = 10000)$.
Both with regard to $K(N)$ and $r(N)$ "more favourable" values result with larger $N$ , even if, as with the redundancy-free binary source $\rm BQ2$,
the application of Lempel–Ziv actually leads to a deterioration.

@@ Line 1: / Line 1: @@
-{{quiz-Header|Buchseite=Informationstheorie/Komprimierung nach Lempel, Ziv und Welch
+{{quiz-Header|Buchseite=Information_Theory/Compression_According_to_Lempel,_Ziv_and_Welch
 }}
-[[File:P_ID2449__Inf_Z_2_5_neu.png|right|LZW–Datenlänge <i>L</i>(<i>N</i>) für zwei Quellen ]]
+[[File:P_ID2449__Inf_Z_2_5_neu.png|right|frame|Data length $L(N)$ after LZW coding for $\rm BQ1$ and $\rm BQ2$]]
-Wir betrachten wie in der [[Aufgaben:2.5_Restredundanz_bei_LZW-Codierung|Aufgabe 2.5]] die Datenkomprimierung mit dem 1983 veröffentlichten [[Informationstheorie/Komprimierung_nach_Lempel,_Ziv_und_Welch#Der_Lempel.E2.80.93Ziv.E2.80.93Welch.E2.80.93Algorithmus|Lempel&ndash;Ziv&ndash;Welch&ndash;Algorithmus]]. Dabei gilt:
+As in&nbsp; [[Aufgaben:2.5_Restredundanz_bei_LZW-Codierung|Exercise 2.5]]&nbsp; we consider data compression using the&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#The_Lempel-Ziv-Welch_algorithm|Lempel&ndash;Ziv&ndash;Welch algorithm]]. Here holds:
-* Die Eingangsfolge habe die Länge <i>N</i>.
+* Let the input sequence have the length&nbsp; $N$.
-* Die Länge der LZW&ndash;Coderausgabe ist <i>L</i>.
+* The length of the LZW coder output is&nbsp; $L$.
-Die Grafik zeigt für zwei verschiedene binäre Nachrichtenquellen '''BQ1''' und '''BQ2''' den Zusammenhang zwischen den Folgenlängen <i>N</i> und <i>L</i>, dargestellt durch den Funktionsverlauf <i>L</i>(<i>N</i>). Die beiden Nachrichtenquellen besitzen die gleichen statistischen Eigenschaften wie in Aufgabe 2.5:
-* '''BQ1''' ist aufgrund von ungleichen Symbolwahrscheinlichkeiten (<i>p</i><sub>A</sub> = 0.89, <i>p</i><sub>B</sub>&nbsp;=&nbsp;0.11) redundant. Es bestehen keine Bindungen zwischen den einzelnen Symbolen. Die Entropie ist <i>H</i> = 0.5 bit/Quellensymbol.
-* '''BQ2'''  ist redundanzfrei und weist die Entropie <i>H</i> = 1 bit/Quellensymbol auf.
-Weiter benötigen Sie für die Lösung dieser Aufagbe noch zwei Definitionen:
+The graph shows for two different binary sources&nbsp; $\rm BQ1$&nbsp; and&nbsp; $\rm BQ2$&nbsp; the relationship between the sequence lengths&nbsp; $N$&nbsp; and&nbsp; $L$,&nbsp; represented by the function&nbsp; $L(N)$.&nbsp;
-* Der <i>Komprimierungsfaktor</i> ist definitionsgemäß <i>K</i>(<i>N</i>) = <i>L</i>(<i>N</i>)/<i>N</i>.
-* Die relative Redundanz der LZW&ndash;Coderfolge (im Folgenden <i>Restredundanz</i> genannt) ist
+The two sources have the same statistical properties as in&nbsp; [[Aufgaben:Aufgabe_2.5:_Restredundanz_bei_LZW-Codierung|Exercise 2.5]]:
+* $\rm BQ1$&nbsp; is redundant due to unequal symbol probabilities&nbsp; $(p_{\rm A} = 0.89$&nbsp; and&nbsp; $p_{\rm B} = 0.11)$&nbsp;.&nbsp; There are no dependencies between the individual symbols.&nbsp; The entropy is&nbsp; $H = 0.5$ bit/source symbol.
+* $\rm BQ2$&nbsp; is redundancy-free and thus has the entropy&nbsp; $H = 1$ bit/source symbol.
+Furthermore, you need two definitions for the solution of this task:
+*Let the&nbsp; "compression factor"&nbsp; be by definition
+:$$K(N) = \frac{L(N)}{N}\hspace{0.05cm}.$$
+* The relative redundancy of the LZW code sequence&nbsp; (called&nbsp; "residual redundancy"&nbsp; in the following)&nbsp; is
 :$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 -  \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$
-''Hinweise:''
-*Die Aufgabe gehört zum  Kapitel [[Informationstheorie/Komprimierung_nach_Lempel,_Ziv_und_Welch|Komprimierung nach Lempel, Ziv und Welch]].
-*Insbesondere wird  Bezug genommen auf die Seiten [[Informationstheorie/Komprimierung_nach_Lempel,_Ziv_und_Welch#Restredrundanz_als_Ma.C3.9F_f.C3.BCr_die_Effizienz_von_Codierverfahren|Restredrundanz als Maß für die Effizienz von Codierverfahren]], [[Informationstheorie/Komprimierung_nach_Lempel,_Ziv_und_Welch#Effizienz_der_Lempel.E2.80.93Ziv.E2.80.93Codierung|Effizienz der Lempel-Ziv-Codierung]] sowie [[Informationstheorie/Komprimierung_nach_Lempel,_Ziv_und_Welch#Quantitative_Aussagen_zur_asymptotischen_Optimalit.C3.A4t|Quantitative Aussagen zur asymptotischen Optimalität]].
-*Sollte die Eingabe des Zahlenwertes &bdquo;0&rdquo; erforderlich sein, so geben Sie bitte &bdquo;0.&rdquo; ein.
-===Fragebogen===
+Hints:
+*The exercise belongs to the chapter&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch|Compression according to Lempel, Ziv and Welch]].
+*In particular, reference is made to the pages
+:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Remaining_redundancy_as_a_measure_for_the_efficiency_of_coding_methods|Remainding redundancy as a measure of the efficiency of coding methods]],
+:: [[Information_Theory/Compression_According_to_Lempel,_Ziv_and_Welch#Efficiency_of_Lempel-Ziv_coding|Efficiency of Lempel-Ziv coding]] and
+:: [[Information_Theory/Compression_According_to_Lempel,_Ziv_and_Welch#Quantitative_statements_on_asymptotic_optimality|Quantitative statements on asymptotic optimality]].
+===Questions===
 <quiz display=simple>
-{Welche Komprimierungfaktoren <i>K</i>(<i>N</i>) ergeben sich mit <i>N</i> = 10000?
+{Which compression factors&nbsp; $K(N)$&nbsp; result with&nbsp; $N = 10000$?
 |type="{}"}
-'''BQ1''': &nbsp; &nbsp; $K(N = 10000) \ = $ { 0.68 3% }
+$\rm BQ1$: &nbsp; &nbsp; $K(N = 10000) \ = \ $ { 0.68 3% }
-'''BQ2''': &nbsp; &nbsp; $K(N = 10000) \ = $ { 1.233 3% }
+$\rm BQ2$: &nbsp; &nbsp; $K(N = 10000) \ = \  $ { 1.233 3% }
-{Wie groß ist die Restredundanz  <i>r</i>(<i>N</i>) (in Prozent)? Es gelte wieder <i>N</i> = 10000.
+{What is the residual redundancy&nbsp;  $r(N)$&nbsp; (in percent)?&nbsp; Let&nbsp; $N = 10000$ apply again.
 |type="{}"}
-'''BQ1''': &nbsp; &nbsp; $r(N = 10000) \ = $ { 26.5 3% } $\ \%$
+$\rm BQ1$: &nbsp; &nbsp; $r(N = 10000) \ = \  $ { 26.5 3% } $\ \%$
-'''BQ2''': &nbsp; &nbsp; $r(N = 10000) \ = $ { 19.0 3% } $\ \%$
+$\rm BQ2$: &nbsp; &nbsp; $r(N = 10000) \ = \  $ { 19.0 3% } $\ \%$
-{Welche Aussagen liefert der Vergleich von <i>N</i> = 10000 und <i>N</i> = 50000?
+{What conclusions can be drawn from the comparison of&nbsp; $N = 10000$&nbsp; and&nbsp; $N = 50000$?
 |type="[]"}
-+ Bei beiden Quellen ist <i>K</i>(<i>N</i> = 50000) kleiner als <i>K</i>(<i>N</i> = 10000).
++ For both sources&nbsp; $K(N = 50000)$&nbsp; is smaller than&nbsp; $K(N = 10000)$.
-+ Bei beiden Quellen ist <i>r</i>(<i>N</i> = 50000) kleiner als <i>r</i>(<i>N</i> = 10000).
++ For both sources&nbsp; $r(N = 50000)$&nbsp; is smaller than&nbsp; $r(N = 10000)$.
-- Nur bei '''BQ1''' ergeben sich mit <i>N</i> = 50000 günstigere Werte.
+- Only&nbsp; $\rm BQ1$&nbsp; has more favourable values with&nbsp; $N = 50000$&nbsp;.
@@ Line 49: / Line 63: @@
 </quiz>
-===Musterlösung===
+===Solution===
 {{ML-Kopf}}
-:<b>1.</b>&nbsp;&nbsp;Der Komprimierungsfaktor ist definiert als der Quotient der Längen von LZW&ndash;Ausgangsfolge (<i>L</i>) und Eingangsfolge (<i>N</i> = 10000):
+'''(1)'''&nbsp; The compression factor is defined as the quotient of the lengths of the LZW output sequence&nbsp; $(L)$&nbsp; and input sequence&nbsp; $(N = 10000)$:
-:$${\rm BQ1:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{6800}{10000}\hspace{0.15cm}\underline{= 0.680}\hspace{0.05cm},\\ \\
+:$${\rm BQ1:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{6800}{10000}\hspace{0.15cm}\underline{= 0.680}\hspace{0.05cm},$$
- {\rm BQ2:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{12330}{10000}\hspace{0.15cm}\underline{= 1.233}\hspace{0.05cm}. $$
+:$$ {\rm BQ2:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{12330}{10000}\hspace{0.15cm}\underline{= 1.233}\hspace{0.05cm}. $$
-:Die Lempel&ndash;Ziv&ndash;Codierung macht natürlich nur bei der redundanten Binärquelle BQ1 Sinn. Hier kann die Datenmenge um 32% gesenkt werden. Bei der redundanzfreien Binärquelle BQ2 führt dagegen die LZ&ndash;Codierung zu einer um 23.3% größeren Datenmenge.
+*Of course, the LZW coding only makes sense with the redundant binary source&nbsp; $\rm BQ1$&nbsp;.&nbsp; Here the amount of data can be reduced by&nbsp; $1-0.68=32\%$&nbsp;.
+*With the redundancy-free binary source&nbsp; $\rm BQ2$&nbsp;, on the other hand, the LZW coding leads to a&nbsp; $1.233-1=23.3\%$&nbsp; larger data quantity.
-:<b>2.</b>&nbsp;&nbsp;Aus der angegebenen Gleichung erhält man mit <i>N</i> = 10000:
-:$${\rm BQ1:}\hspace{0.3cm} H = 0.5\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{0.5 \cdot N}{L } = 1 - \frac{5000}{6800 }  \hspace{0.15cm}\underline{\approx 26.5\,\%}\hspace{0.05cm},\\  \\
- {\rm BQ2:}\hspace{0.3cm} H = 1.0\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{N}{L } = 1 - \frac{10000}{12330 }  \hspace{0.15cm}\underline{\approx 19\,\%}\hspace{0.05cm}.$$
-:Die Restredundanz gibt die (relative) Redundanz der LZ&ndash;Ausgangsfolge an. Obwohl die Quelle BQ1 für die LZ&ndash;Codierung besser geeignet ist als die redundanzfreie Quelle BQ2, ergibt sich bei BQ1 wegen der Redundanz in der Eingangsfolge eine größere Restredundanz.
-:Eine kleinere Restredundanz <i>r</i>(<i>N</i>) bei gegebenem <i>N</i> sagt also nichts darüber aus, ob der Einsatz von Lempel&ndash;Ziv für die vorliegende Quelle sinnvoll ist. Hierzu muss der Komprimierungsfaktor <i>K</i> betrachtet werden. Allgemein gilt folgender Zusammenhang zwischen <i>r</i>(<i>N</i>) und <i>K</i>(<i>N</i>):
+'''(2)'''&nbsp; From the given equation we obtain with&nbsp; $N = 10000$:
-:$$r(N) = 1 - \frac{H}{K(N)}\hspace{0.05cm},\hspace{0.2cm} K(N) = H \cdot (1- r(N))
+:$${\rm BQ1:}\hspace{0.3cm} H = 0.5\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{0.5 \cdot N}{L } = 1 - \frac{5000}{6800 }  \hspace{0.15cm}\underline{\approx 26.5\,\%}\hspace{0.05cm},$$
+:$$ {\rm BQ2:}\hspace{0.3cm} H = 1.0\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{N}{L } = 1 - \frac{10000}{12330 }  \hspace{0.15cm}\underline{\approx 19\,\%}\hspace{0.05cm}.$$
+*The residual redundancy indicates the (relative) redundancy of the LZW output sequence.
+*Although the source&nbsp; $\rm BQ1$&nbsp; is more suitable for LZW coding than the redundancy-free source&nbsp; $\rm BQ2$, &nbsp; $\rm BQ1$&nbsp; results in a larger residual redundancy because of the redundancy in the input sequence.
+*A smaller residual redundancy&nbsp; $r(N)$&nbsp; for a given&nbsp; $N$&nbsp; therefore says nothing about whether LZW coding is useful for the source at hand.
+*For this, the compression factor&nbsp; $K(N)$&nbsp; must be considered. &nbsp; In general, the following relationship between&nbsp; $r(N)$&nbsp; and&nbsp; $K(N)$ applies:
+:$$r(N) = 1 - \frac{H}{K(N)}\hspace{0.05cm},\hspace{0.5cm} K(N) = H \cdot (1- r(N))
 \hspace{0.05cm}.$$
-:<b>3.</b>&nbsp;&nbsp;Aus der Tabelle auf der Angabenseite kann man ablesen (bzw. daraus ableiten)
-:für die redundante Binärquelle BQ1:
+'''(3)'''&nbsp; From the table on the information page one can read&nbsp; (or deduce)
-:$$L(N = 50000) = 32100\hspace{0.05cm},\hspace{0.2cm} K(N = 50000) = 0.642\hspace{0.05cm},\hspace{0.2cm}r(N = 50000) \hspace{0.15cm}\underline {= 22.2\,\% \hspace{0.05cm}},$$
-:für die redundanzfreie Binärquelle BQ2:
+*for the redundant binary source&nbsp; $\rm BQ1$:
-:$$L(N = 50000) = 59595\hspace{0.05cm},\hspace{0.2cm} K(N = 50000) = 1.192\hspace{0.05cm},\hspace{0.2cm}r(N = 50000) \hspace{0.15cm}\underline {= 16.1\,\% \hspace{0.05cm}}.$$
+:$$L(N = 50000) = 32100\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 0.642\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 22.2\,\% \hspace{0.05cm}},$$
-:Richtig sind somit die <u>Aussagen 1 und 2</u>. Für beide Quellen ist <i>K</i>(<i>N</i> = 50000) < <i>K</i>(<i>N</i> = 10000) und <i>r</i>(<i>N</i>&nbsp;=&nbsp;50000) < <i>r</i>(<i>N</i> = 10000). In beiden Fällen ergeben sich also bei größerem <i>N</i> &bdquo;günstigere&rdquo; Werte, auch dann, wenn eigentlich wie bei der redundanzfreien Binärquelle BQ2 die Anwendung von Lempel&ndash;Ziv zu einer Verschlechterung führt.
+*for the redundancy-free binary source&nbsp; $\rm BQ2$:
+:$$L(N = 50000) = 59595\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 1.192\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 16.1\,\% \hspace{0.05cm}}.$$
+<u>Statements 1 and 2</u> are therefore correct:
+* For both sources the compression factor &nbsp;$K(N)$&nbsp; is smaller for &nbsp;$N = 50000$&nbsp; than for &nbsp;$N = 10000$.
+* The same applies to the residual redundancy: &nbsp; $r(N = 50000) < r(N = 10000)$.
+* Both with regard to &nbsp;$K(N)$&nbsp;  and&nbsp; $r(N)$&nbsp; "more favourable"&nbsp; values result with larger&nbsp; $N$&nbsp;, even if, as with the redundancy-free binary source&nbsp; $\rm BQ2$,&nbsp; <br>the application of Lempel&ndash;Ziv actually leads to a deterioration.
 {{ML-Fuß}}
@@ Line 77: / Line 100: @@
-[[Category:Aufgaben zu Informationstheorie|^2.2 Verfahren von Lempel, Ziv, Welch^]]
+[[Category:Information Theory: Exercises|^2.2 Lempel-Ziv-Welch Compression^]]