Difference between revisions of "Aufgaben:Exercise 2.5Z: Compression Factor vs. Residual Redundancy"

From LNTwww
Line 65: Line 65:
 
===Solution===
 
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
'''(1)'''  Der Komprimierungsfaktor ist definiert als der Quotient der Längen von LZW–Ausgangsfolge  $(L)$  und Eingangsfolge  $(N = 10000)$:
+
'''(1)'''  The compression factor is defined as the quotient of the lengths of the LZW output sequence  $(L)$  and input sequence  $(N = 10000)$:
 
:$${\rm BQ1:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{6800}{10000}\hspace{0.15cm}\underline{= 0.680}\hspace{0.05cm},$$  
 
:$${\rm BQ1:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{6800}{10000}\hspace{0.15cm}\underline{= 0.680}\hspace{0.05cm},$$  
 
:$$ {\rm BQ2:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{12330}{10000}\hspace{0.15cm}\underline{= 1.233}\hspace{0.05cm}. $$
 
:$$ {\rm BQ2:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{12330}{10000}\hspace{0.15cm}\underline{= 1.233}\hspace{0.05cm}. $$
*Die LZW–Codierung macht natürlich nur bei der redundanten Binärquelle  $\rm BQ1$  Sinn.  Hier kann die Datenmenge um  $32\%$  gesenkt werden.  
+
*Of course, the LZW coding only makes sense with the redundant binary source  $\rm BQ1$ .  Here the amount of data can be reduced by  $32\%$ .  
*Bei der redundanzfreien Binärquelle  $\rm BQ2$  führt dagegen die LZW–Codierung zu einer um  $23.3\%$  größeren Datenmenge.
+
*With the redundancy-free binary source  $\rm BQ2$ , on the other hand, the LZW coding leads to a  $23.3\%$  larger data quantity.
  
  
  
'''(2)'''  Aus der angegebenen Gleichung erhält man mit  $N = 10000$:
+
'''(2)'''  From the given equation we obtain with  $N = 10000$:
 
:$${\rm BQ1:}\hspace{0.3cm} H = 0.5\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{0.5 \cdot N}{L } = 1 - \frac{5000}{6800 }  \hspace{0.15cm}\underline{\approx 26.5\,\%}\hspace{0.05cm},$$
 
:$${\rm BQ1:}\hspace{0.3cm} H = 0.5\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{0.5 \cdot N}{L } = 1 - \frac{5000}{6800 }  \hspace{0.15cm}\underline{\approx 26.5\,\%}\hspace{0.05cm},$$
 
:$$ {\rm BQ2:}\hspace{0.3cm} H = 1.0\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{N}{L } = 1 - \frac{10000}{12330 }  \hspace{0.15cm}\underline{\approx 19\,\%}\hspace{0.05cm}.$$
 
:$$ {\rm BQ2:}\hspace{0.3cm} H = 1.0\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{N}{L } = 1 - \frac{10000}{12330 }  \hspace{0.15cm}\underline{\approx 19\,\%}\hspace{0.05cm}.$$
*Die Restredundanz gibt die (relative) Redundanz der LZW–Ausgangsfolge an.  
+
*The residual redundancy indicates the (relative) redundancy of the LZW output sequence.
*Obwohl die Quelle  $\rm BQ1$  für die LZW–Codierung besser geeignet ist als die redundanzfreie Quelle  $\rm BQ2$, ergibt sich bei  $\rm BQ1$  wegen der Redundanz in der Eingangsfolge eine größere Restredundanz.
+
*Although the source  $\rm BQ1$  is more suitable for LZW coding than the redundancy-free source  $\rm BQ2$,   $\rm BQ1$  results in a larger residual redundancy because of the redundancy in the input sequence.
*Eine kleinere Restredundanz  $r(N)$  bei gegebenem  $N$  sagt also nichts darüber aus, ob die LZW–Codierung für die vorliegende Quelle sinnvoll ist.  
+
*A smaller residual redundancy  $r(N)$  for a given  $N$  therefore says nothing about whether LZW coding is useful for the source at hand.
*Hierzu muss der Komprimierungsfaktor  $K(N)$  betrachtet werden.  Allgemein gilt folgender Zusammenhang zwischen  $r(N)$  und  $K(N)$:
+
*For this, the compression factor  $K(N)$  must be considered.   In general, the following relationship between  $r(N)$  and  $K(N)$ applies:
 
:$$r(N) = 1 - \frac{H}{K(N)}\hspace{0.05cm},\hspace{0.5cm} K(N) = H \cdot (1- r(N))
 
:$$r(N) = 1 - \frac{H}{K(N)}\hspace{0.05cm},\hspace{0.5cm} K(N) = H \cdot (1- r(N))
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
  
'''(3)'''  Aus der Tabelle auf der Angabenseite kann man ablesen  (bzw. daraus ableiten)
+
'''(3)'''  From the table on the information page one can read  (or deduce)
  
*für die redundante Binärquelle  $\rm BQ1$:
+
*for the redundant binary source  $\rm BQ1$:
 
:$$L(N = 50000) = 32100\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 0.642\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 22.2\,\% \hspace{0.05cm}},$$
 
:$$L(N = 50000) = 32100\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 0.642\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 22.2\,\% \hspace{0.05cm}},$$
*für die redundanzfreie Binärquelle  $\rm BQ2$:
+
*for the redundancy-free binary source  $\rm BQ2$:
 
:$$L(N = 50000) = 59595\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 1.192\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 16.1\,\% \hspace{0.05cm}}.$$
 
:$$L(N = 50000) = 59595\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 1.192\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 16.1\,\% \hspace{0.05cm}}.$$
  
Richtig sind somit die <u>Aussagen 1 und 2</u>:
+
<u>Statements 1 and 2</u> are therefore correct:
* Für beide Quellen ist der Komprimierungsfaktor &nbsp;$K(N)$&nbsp; für  &nbsp;$N = 50000$&nbsp; kleiner als für &nbsp;$N = 10000$.
+
* For both sources the compression factor &nbsp;$K(N)$&nbsp; is smaller for &nbsp;$N = 50000$&nbsp; than for &nbsp;$N = 10000$.
* Gleiches gilt für die Restredundanz: &nbsp; $r(N = 50000) < r(N = 10000)$.
+
* The same applies to the residual redundancy: &nbsp; $r(N = 50000) < r(N = 10000)$.
* Sowohl hinsichtlich &nbsp;$K(N)$&nbsp;  als auch hinsichtlich&nbsp; $r(N)$&nbsp; ergeben sich also bei größerem&nbsp; $N$&nbsp; "günstigere" Werte, auch dann, wenn eigentlich wie bei der redundanzfreien Binärquelle&nbsp; $\rm BQ2$&nbsp; die Anwendung von Lempel&ndash;Ziv zu einer Verschlechterung führt.
+
* Both with regard to &nbsp;$K(N)$&nbsp;  and&nbsp; $r(N)$&nbsp; "more favourable" values result with larger&nbsp; $N$&nbsp;, even if, as with the redundancy-free binary source&nbsp; $\rm BQ2$&nbsp;, the application of Lempel&ndash;Ziv actually leads to a deterioration.
  
 
{{ML-Fuß}}
 
{{ML-Fuß}}

Revision as of 12:37, 2 August 2021

Data length $L(N)$ after LZW coding for $\rm BQ1$ and $\rm BQ2$

As in  exercise 2.5  we consider data compression using the  Lempel–Ziv–Welch–algorithm. Here holds:

  • Let the input sequence have length  $N$.
  • The length of the LZW coder output is  $L$.


The graph shows for two different binary sources  $\rm BQ1$  and  $\rm BQ2$  the relationship between the sequence lengths  $N$  and  $L$, represented by the function  $L(N)$. 

The two message sources have the same statistical properties as in  exercise 2.5:

  • $\rm BQ1$  is redundant due to unequal symbol probabilities  $(p_{\rm A} = 0.89$  and  $p_{\rm B} = 0.11)$ .  There are no ties between the individual symbols.  The entropy is  $H = 0.5$ bit/source symbol.
  • $\rm BQ2$  is redundancy-free and thus has entropy  $H = 1$ bit/source symbol.


Furthermore, you need two definitions for the solution of this task:

  • Let the compression factor be by definition
$$K(N) = \frac{L(N)}{N}\hspace{0.05cm}.$$
  • The relative redundancy of the LZW code sequence  (called  residual redundancy  in the following)  is
$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 - \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$





Hints:

Residual redundancy as a measure of the efficiency of coding methods,
Efficiency of Lempel-Ziv coding and
Quantitative statements on asymptotic optimality.


Questions

1

Which compression factors  $K(N)$  result with  $N = 10000$?

$\rm BQ1$:     $K(N = 10000) \ = \ $

$\rm BQ2$:     $K(N = 10000) \ = \ $

2

What is the residual redundancy  $r(N)$  (in percent)?  Let  $N = 10000$ apply again.

$\rm BQ1$:     $r(N = 10000) \ = \ $

$\ \%$
$\rm BQ2$:     $r(N = 10000) \ = \ $

$\ \%$

3

What conclusions can be drawn from the comparison of  $N = 10000$  And  $N = 50000$?

For both sources  $K(N = 50000)$  is smaller than  $K(N = 10000)$.
For both sources  $r(N = 50000)$  is smaller than  $r(N = 10000)$.
Only  $\rm BQ1$  has more favourable values with  $N = 50000$ .


Solution

(1)  The compression factor is defined as the quotient of the lengths of the LZW output sequence  $(L)$  and input sequence  $(N = 10000)$:

$${\rm BQ1:}\hspace{0.3cm} K \hspace{0.2cm} = \hspace{0.2cm} \frac{6800}{10000}\hspace{0.15cm}\underline{= 0.680}\hspace{0.05cm},$$
$$ {\rm BQ2:}\hspace{0.3cm} K \hspace{0.2cm} = \hspace{0.2cm} \frac{12330}{10000}\hspace{0.15cm}\underline{= 1.233}\hspace{0.05cm}. $$
  • Of course, the LZW coding only makes sense with the redundant binary source  $\rm BQ1$ .  Here the amount of data can be reduced by  $32\%$ .
  • With the redundancy-free binary source  $\rm BQ2$ , on the other hand, the LZW coding leads to a  $23.3\%$  larger data quantity.


(2)  From the given equation we obtain with  $N = 10000$:

$${\rm BQ1:}\hspace{0.3cm} H = 0.5\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} = \hspace{0.2cm}1 - \frac{0.5 \cdot N}{L } = 1 - \frac{5000}{6800 } \hspace{0.15cm}\underline{\approx 26.5\,\%}\hspace{0.05cm},$$
$$ {\rm BQ2:}\hspace{0.3cm} H = 1.0\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} = \hspace{0.2cm}1 - \frac{N}{L } = 1 - \frac{10000}{12330 } \hspace{0.15cm}\underline{\approx 19\,\%}\hspace{0.05cm}.$$
  • The residual redundancy indicates the (relative) redundancy of the LZW output sequence.
  • Although the source  $\rm BQ1$  is more suitable for LZW coding than the redundancy-free source  $\rm BQ2$,   $\rm BQ1$  results in a larger residual redundancy because of the redundancy in the input sequence.
  • A smaller residual redundancy  $r(N)$  for a given  $N$  therefore says nothing about whether LZW coding is useful for the source at hand.
  • For this, the compression factor  $K(N)$  must be considered.   In general, the following relationship between  $r(N)$  and  $K(N)$ applies:
$$r(N) = 1 - \frac{H}{K(N)}\hspace{0.05cm},\hspace{0.5cm} K(N) = H \cdot (1- r(N)) \hspace{0.05cm}.$$


(3)  From the table on the information page one can read  (or deduce)

  • for the redundant binary source  $\rm BQ1$:
$$L(N = 50000) = 32100\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 0.642\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 22.2\,\% \hspace{0.05cm}},$$
  • for the redundancy-free binary source  $\rm BQ2$:
$$L(N = 50000) = 59595\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 1.192\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 16.1\,\% \hspace{0.05cm}}.$$

Statements 1 and 2 are therefore correct:

  • For both sources the compression factor  $K(N)$  is smaller for  $N = 50000$  than for  $N = 10000$.
  • The same applies to the residual redundancy:   $r(N = 50000) < r(N = 10000)$.
  • Both with regard to  $K(N)$  and  $r(N)$  "more favourable" values result with larger  $N$ , even if, as with the redundancy-free binary source  $\rm BQ2$ , the application of Lempel–Ziv actually leads to a deterioration.