Difference between revisions of "Aufgaben:Exercise 2.5Z: Compression Factor vs. Residual Redundancy"

Revision as of 12:37, 2 August 2021

Data length $L(N)$ after LZW coding for $\rm BQ1$ and $\rm BQ2$

As in exercise 2.5 we consider data compression using the Lempel–Ziv–Welch–algorithm. Here holds:

Let the input sequence have length $N$.
The length of the LZW coder output is $L$.

The graph shows for two different binary sources $\rm BQ1$ and $\rm BQ2$ the relationship between the sequence lengths $N$ and $L$, represented by the function $L(N)$.

The two message sources have the same statistical properties as in exercise 2.5:

$\rm BQ1$ is redundant due to unequal symbol probabilities $(p_{\rm A} = 0.89$ and $p_{\rm B} = 0.11)$ . There are no ties between the individual symbols. The entropy is $H = 0.5$ bit/source symbol.
$\rm BQ2$ is redundancy-free and thus has entropy $H = 1$ bit/source symbol.

Furthermore, you need two definitions for the solution of this task:

Let the compression factor be by definition

$$K(N) = \frac{L(N)}{N}\hspace{0.05cm}.$$

The relative redundancy of the LZW code sequence (called residual redundancy in the following) is

$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 - \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$

Hints:

The task belongs to the chapter Compression according to Lempel, Ziv and Welch.
In particular, reference is made to the pages

Residual redundancy as a measure of the efficiency of coding methods,

Efficiency of Lempel-Ziv coding and

Quantitative statements on asymptotic optimality.

Questions

$\rm BQ1$: $K(N = 10000) \ = \ $

$\rm BQ2$: $K(N = 10000) \ = \ $

$\rm BQ1$: $r(N = 10000) \ = \ $

$\ \%$

$\rm BQ2$: $r(N = 10000) \ = \ $

$\ \%$

	For both sources $K(N = 50000)$ is smaller than $K(N = 10000)$.
	For both sources $r(N = 50000)$ is smaller than $r(N = 10000)$.
	Only $\rm BQ1$ has more favourable values with $N = 50000$ .

Solution

(1) The compression factor is defined as the quotient of the lengths of the LZW output sequence $(L)$ and input sequence $(N = 10000)$:

$${\rm BQ1:}\hspace{0.3cm} K \hspace{0.2cm} = \hspace{0.2cm} \frac{6800}{10000}\hspace{0.15cm}\underline{= 0.680}\hspace{0.05cm},$$

$$ {\rm BQ2:}\hspace{0.3cm} K \hspace{0.2cm} = \hspace{0.2cm} \frac{12330}{10000}\hspace{0.15cm}\underline{= 1.233}\hspace{0.05cm}. $$

Of course, the LZW coding only makes sense with the redundant binary source $\rm BQ1$ . Here the amount of data can be reduced by $32\%$ .
With the redundancy-free binary source $\rm BQ2$ , on the other hand, the LZW coding leads to a $23.3\%$ larger data quantity.

(2) From the given equation we obtain with $N = 10000$:

$${\rm BQ1:}\hspace{0.3cm} H = 0.5\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} = \hspace{0.2cm}1 - \frac{0.5 \cdot N}{L } = 1 - \frac{5000}{6800 } \hspace{0.15cm}\underline{\approx 26.5\,\%}\hspace{0.05cm},$$

$$ {\rm BQ2:}\hspace{0.3cm} H = 1.0\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} = \hspace{0.2cm}1 - \frac{N}{L } = 1 - \frac{10000}{12330 } \hspace{0.15cm}\underline{\approx 19\,\%}\hspace{0.05cm}.$$

The residual redundancy indicates the (relative) redundancy of the LZW output sequence.
Although the source $\rm BQ1$ is more suitable for LZW coding than the redundancy-free source $\rm BQ2$, $\rm BQ1$ results in a larger residual redundancy because of the redundancy in the input sequence.
A smaller residual redundancy $r(N)$ for a given $N$ therefore says nothing about whether LZW coding is useful for the source at hand.
For this, the compression factor $K(N)$ must be considered. In general, the following relationship between $r(N)$ and $K(N)$ applies:

$$r(N) = 1 - \frac{H}{K(N)}\hspace{0.05cm},\hspace{0.5cm} K(N) = H \cdot (1- r(N)) \hspace{0.05cm}.$$

(3) From the table on the information page one can read (or deduce)

for the redundant binary source $\rm BQ1$:

$$L(N = 50000) = 32100\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 0.642\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 22.2\,\% \hspace{0.05cm}},$$

for the redundancy-free binary source $\rm BQ2$:

$$L(N = 50000) = 59595\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 1.192\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 16.1\,\% \hspace{0.05cm}}.$$

Statements 1 and 2 are therefore correct:

For both sources the compression factor $K(N)$ is smaller for $N = 50000$ than for $N = 10000$.
The same applies to the residual redundancy: $r(N = 50000) < r(N = 10000)$.
Both with regard to $K(N)$ and $r(N)$ "more favourable" values result with larger $N$ , even if, as with the redundancy-free binary source $\rm BQ2$ , the application of Lempel–Ziv actually leads to a deterioration.

@@ Line 65: / Line 65: @@
 ===Solution===
 {{ML-Kopf}}
-'''(1)'''&nbsp; Der Komprimierungsfaktor ist definiert als der Quotient der Längen von LZW&ndash;Ausgangsfolge&nbsp; $(L)$&nbsp; und Eingangsfolge&nbsp; $(N = 10000)$:
+'''(1)'''&nbsp; The compression factor is defined as the quotient of the lengths of the LZW output sequence&nbsp; $(L)$&nbsp; and input sequence&nbsp; $(N = 10000)$:
 :$${\rm BQ1:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{6800}{10000}\hspace{0.15cm}\underline{= 0.680}\hspace{0.05cm},$$
 :$$ {\rm BQ2:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{12330}{10000}\hspace{0.15cm}\underline{= 1.233}\hspace{0.05cm}. $$
-*Die LZW&ndash;Codierung macht natürlich nur bei der redundanten Binärquelle&nbsp; $\rm BQ1$&nbsp; Sinn.&nbsp; Hier kann die Datenmenge um&nbsp; $32\%$&nbsp; gesenkt werden.
+*Of course, the LZW coding only makes sense with the redundant binary source&nbsp; $\rm BQ1$&nbsp;.&nbsp; Here the amount of data can be reduced by&nbsp; $32\%$&nbsp;.
-*Bei der redundanzfreien Binärquelle&nbsp; $\rm BQ2$&nbsp; führt dagegen die LZW&ndash;Codierung zu einer um&nbsp; $23.3\%$&nbsp; größeren Datenmenge.
+*With the redundancy-free binary source&nbsp; $\rm BQ2$&nbsp;, on the other hand, the LZW coding leads to a&nbsp; $23.3\%$&nbsp; larger data quantity.
-'''(2)'''&nbsp; Aus der angegebenen Gleichung erhält man mit&nbsp; $N = 10000$:
+'''(2)'''&nbsp; From the given equation we obtain with&nbsp; $N = 10000$:
 :$${\rm BQ1:}\hspace{0.3cm} H = 0.5\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{0.5 \cdot N}{L } = 1 - \frac{5000}{6800 }  \hspace{0.15cm}\underline{\approx 26.5\,\%}\hspace{0.05cm},$$
 :$$ {\rm BQ2:}\hspace{0.3cm} H = 1.0\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{N}{L } = 1 - \frac{10000}{12330 }  \hspace{0.15cm}\underline{\approx 19\,\%}\hspace{0.05cm}.$$
-*Die Restredundanz gibt die (relative) Redundanz der LZW&ndash;Ausgangsfolge an.
+*The residual redundancy indicates the (relative) redundancy of the LZW output sequence.
-*Obwohl die Quelle&nbsp; $\rm BQ1$&nbsp; für die LZW&ndash;Codierung besser geeignet ist als die redundanzfreie Quelle&nbsp; $\rm BQ2$, ergibt sich bei&nbsp; $\rm BQ1$&nbsp; wegen der Redundanz in der Eingangsfolge eine größere Restredundanz.
+*Although the source&nbsp; $\rm BQ1$&nbsp; is more suitable for LZW coding than the redundancy-free source&nbsp; $\rm BQ2$, &nbsp; $\rm BQ1$&nbsp; results in a larger residual redundancy because of the redundancy in the input sequence.
-*Eine kleinere Restredundanz&nbsp; $r(N)$&nbsp; bei gegebenem&nbsp; $N$&nbsp; sagt also nichts darüber aus, ob die LZW&ndash;Codierung für die vorliegende Quelle sinnvoll ist.
+*A smaller residual redundancy&nbsp; $r(N)$&nbsp; for a given&nbsp; $N$&nbsp; therefore says nothing about whether LZW coding is useful for the source at hand.
-*Hierzu muss der Komprimierungsfaktor&nbsp; $K(N)$&nbsp; betrachtet werden.&nbsp; Allgemein gilt folgender Zusammenhang zwischen&nbsp; $r(N)$&nbsp; und&nbsp; $K(N)$:
+*For this, the compression factor&nbsp; $K(N)$&nbsp; must be considered. &nbsp; In general, the following relationship between&nbsp; $r(N)$&nbsp; and&nbsp; $K(N)$ applies:
 :$$r(N) = 1 - \frac{H}{K(N)}\hspace{0.05cm},\hspace{0.5cm} K(N) = H \cdot (1- r(N))
 \hspace{0.05cm}.$$
-'''(3)'''&nbsp; Aus der Tabelle auf der Angabenseite kann man ablesen&nbsp; (bzw. daraus ableiten)
+'''(3)'''&nbsp; From the table on the information page one can read&nbsp; (or deduce)
-*für die redundante Binärquelle&nbsp; $\rm BQ1$:
+*for the redundant binary source&nbsp; $\rm BQ1$:
 :$$L(N = 50000) = 32100\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 0.642\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 22.2\,\% \hspace{0.05cm}},$$
-*für die redundanzfreie Binärquelle&nbsp; $\rm BQ2$:
+*for the redundancy-free binary source&nbsp; $\rm BQ2$:
 :$$L(N = 50000) = 59595\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 1.192\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 16.1\,\% \hspace{0.05cm}}.$$
-Richtig sind somit die <u>Aussagen 1 und 2</u>:
+<u>Statements 1 and 2</u> are therefore correct:
-* Für beide Quellen ist der Komprimierungsfaktor &nbsp;$K(N)$&nbsp; für  &nbsp;$N = 50000$&nbsp; kleiner als für &nbsp;$N = 10000$.
+* For both sources the compression factor &nbsp;$K(N)$&nbsp; is smaller for &nbsp;$N = 50000$&nbsp; than for &nbsp;$N = 10000$.
-* Gleiches gilt für die Restredundanz: &nbsp; $r(N = 50000) < r(N = 10000)$.
+* The same applies to the residual redundancy: &nbsp; $r(N = 50000) < r(N = 10000)$.
-* Sowohl hinsichtlich &nbsp;$K(N)$&nbsp;  als auch hinsichtlich&nbsp; $r(N)$&nbsp; ergeben sich also bei größerem&nbsp; $N$&nbsp; "günstigere" Werte, auch dann, wenn eigentlich wie bei der redundanzfreien Binärquelle&nbsp; $\rm BQ2$&nbsp; die Anwendung von Lempel&ndash;Ziv zu einer Verschlechterung führt.
+* Both with regard to &nbsp;$K(N)$&nbsp;  and&nbsp; $r(N)$&nbsp; "more favourable" values result with larger&nbsp; $N$&nbsp;, even if, as with the redundancy-free binary source&nbsp; $\rm BQ2$&nbsp;, the application of Lempel&ndash;Ziv actually leads to a deterioration.
 {{ML-Fuß}}