Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js

Difference between revisions of "Aufgaben:Exercise 2.5Z: Compression Factor vs. Residual Redundancy"

From LNTwww
 
(27 intermediate revisions by 4 users not shown)
Line 1: Line 1:
  
{{quiz-Header|Buchseite=Informationstheorie/Komprimierung nach Lempel, Ziv und Welch
+
{{quiz-Header|Buchseite=Information_Theory/Compression_According_to_Lempel,_Ziv_and_Welch
 
}}
 
}}
  
[[File:P_ID2449__Inf_Z_2_5_neu.png|right|]]
+
[[File:P_ID2449__Inf_Z_2_5_neu.png|right|frame|Data length L(N) after LZW coding for BQ1 and BQ2]]
:Wir betrachten wie in Aufgabe A2.5 die Datenkomprimierung mit dem 1983 veröffentlichten Lempel–Ziv–Welch–Algorithmus. Dabei gilt:
+
As in  [[Aufgaben:2.5_Restredundanz_bei_LZW-Codierung|Exercise 2.5]]  we consider data compression using the  [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#The_Lempel-Ziv-Welch_algorithm|Lempel–Ziv–Welch algorithm]]. Here holds:
 +
* Let the input sequence have the length  N.
 +
* The length of the LZW coder output is  L.
  
:* Die Eingangsfolge habe die Länge <i>N</i>.
 
  
:* Die Länge der LZW&ndash;Coderausgabe ist <i>L</i>.
+
The graph shows for two different binary sources&nbsp; BQ1&nbsp; and&nbsp; BQ2&nbsp; the relationship between the sequence lengths&nbsp; N&nbsp; and&nbsp; L,&nbsp; represented by the function&nbsp; L(N).&nbsp;
 +
 
 +
The two sources have the same statistical properties as in&nbsp; [[Aufgaben:Aufgabe_2.5:_Restredundanz_bei_LZW-Codierung|Exercise 2.5]]:
 +
* BQ1&nbsp; is redundant due to unequal symbol probabilities&nbsp; (pA=0.89&nbsp; and&nbsp; pB=0.11)&nbsp;.&nbsp; There are no dependencies between the individual symbols.&nbsp; The entropy is&nbsp; H=0.5 bit/source symbol.
 +
* BQ2&nbsp; is redundancy-free and thus has the entropy&nbsp; H=1 bit/source symbol.
 +
 
 +
 
 +
Furthermore, you need two definitions for the solution of this task:
 +
*Let the&nbsp; "compression factor"&nbsp; be by definition
 +
:K(N)=L(N)N.
 +
* The relative redundancy of the LZW code sequence&nbsp; (called&nbsp; "residual redundancy"&nbsp; in the following)&nbsp; is
 +
:$$r(N) = \frac{L(N) - N \cdot H}{L(N)}= 1 -  \frac{ N \cdot H}{L(N)}\hspace{0.05cm}.$$
 +
 
  
:Die Grafik zeigt für zwei verschiedene binäre Nachrichtenquellen BQ1 und BQ2 den Zusammenhang zwischen den Folgenlängen <i>N</i> und <i>L</i>, dargestellt durch den Funktionsverlauf <i>L</i>(<i>N</i>). BQ1 und BQ2 besitzen die gleichen statistischen Eigenschaften wie in Aufgabe A2.5:
 
  
:* <font color="#cc0000"><span style="font-weight: bold;">BQ1</span></font> ist aufgrund von ungleichen Symbolwahrscheinlichkeiten (<i>p</i><sub>A</sub> = 0.89, <i>p</i><sub>B</sub>&nbsp;=&nbsp;0.11) redundant. Es bestehen keine Bindungen zwischen den einzelnen Symbolen. Die Entropie ist <i>H</i> = 0.5 bit/Quellensymbol.
 
  
:* <font color="#cc0000"><span style="font-weight: bold;">BQ2</span></font> ist redundanzfrei und weist die Entropie <i>H</i> = 1 bit/Quellensymbol auf.
 
  
:Weiter benötigen Sie für die Lösung dieser Aufagbe noch zwei Definitionen:
 
  
:* Der <i>Komprimierungsfaktor</i> ist definitionsgemäß <i>K</i>(<i>N</i>) = <i>L</i>(<i>N</i>)/<i>N</i>.
 
  
:* Die relevante Redundanz der LZW&ndash;Coderfolge (im Folgenden <i>Restredundanz</i> genannt) ist
 
:r(N)=L(N)NHL(N)=1NHL(N).
 
  
:<b>Hinweis:</b> Die Aufgabe bezieht sich auf das Kapitel 2.2.
+
Hints:
 +
*The exercise belongs to the chapter&nbsp; [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch|Compression according to Lempel, Ziv and Welch]].
 +
*In particular, reference is made to the pages
 +
:: [[Information_Theory/Komprimierung_nach_Lempel,_Ziv_und_Welch#Remaining_redundancy_as_a_measure_for_the_efficiency_of_coding_methods|Remainding redundancy as a measure of the efficiency of coding methods]],
 +
:: [[Information_Theory/Compression_According_to_Lempel,_Ziv_and_Welch#Efficiency_of_Lempel-Ziv_coding|Efficiency of Lempel-Ziv coding]] and
 +
:: [[Information_Theory/Compression_According_to_Lempel,_Ziv_and_Welch#Quantitative_statements_on_asymptotic_optimality|Quantitative statements on asymptotic optimality]].
 +
  
  
===Fragebogen===
+
===Questions===
  
 
<quiz display=simple>
 
<quiz display=simple>
{Welche Komprimierungfaktoren <i>K</i> ergeben sich jeweils mit <i>N</i> = 10000?
+
{Which compression factors&nbsp; $K(N)&nbsp; result with&nbsp;N = 10000$?
 
|type="{}"}
 
|type="{}"}
$N = 10000, BQ1:\ K$ = { 0.68 3% }
+
$\rm BQ1: &nbsp; &nbsp;K(N = 10000) \ = \ $ { 0.68 3% }
$BQ2:\ K$ = { 1.233 3% }
+
$\rm BQ2$: &nbsp; &nbsp; $K(N = 10000) \ = \ $ { 1.233 3% }
  
  
{Wie groß ist die zugehörige Restredundanz (in Prozent)?
+
{What is the residual redundancy&nbsp;  r(N)&nbsp; (in percent)?&nbsp; Let&nbsp; N=10000 apply again.
 
|type="{}"}
 
|type="{}"}
$N = 10000, BQ1:\ r$ = { 26.5 3% } %
+
$\rm BQ1: &nbsp; &nbsp;r(N = 10000) \ = \ $ { 26.5 3% } $\ \%$
$BQ2:\ r$ = { 19 3% } %
+
$\rm BQ2$: &nbsp; &nbsp; $r(N = 10000) \ = \ $ { 19.0 3% } $\ \%$
  
  
{Welche Aussagen liefert der Vergleich von <i>N</i> = 10000 und <i>N</i> = 50000?
+
{What conclusions can be drawn from the comparison of&nbsp; $N = 10000&nbsp; and&nbsp;N = 50000$?
 
|type="[]"}
 
|type="[]"}
+ Bei beiden Quellen ist <i>K</i>(<i>N</i> = 50000) kleiner als <i>K</i>(<i>N</i> = 10000).
+
+ For both sources&nbsp; $K(N = 50000)&nbsp; is smaller than&nbsp;K(N = 10000)$.
+ Bei beiden Quellen ist <i>r</i>(<i>N</i> = 50000) kleiner als <i>r</i>(<i>N</i> = 10000).
+
+ For both sources&nbsp; $r(N = 50000)&nbsp; is smaller than&nbsp;r(N = 10000)$.
- Nur bei BQ1 ergeben sich mit <i>N</i> = 50000 günstigere Werte.
+
- Only&nbsp; $\rm BQ1&nbsp; has more favourable values with&nbsp;N = 50000$&nbsp;.
  
  
Line 51: Line 63:
 
</quiz>
 
</quiz>
  
===Musterlösung===
+
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
:<b>1.</b>&nbsp;&nbsp;Der Komprimierungsfaktor ist definiert als der Quotient der Längen von LZW&ndash;Ausgangsfolge (<i>L</i>) und Eingangsfolge (<i>N</i> = 10000):
+
'''(1)'''&nbsp; The compression factor is defined as the quotient of the lengths of the LZW output sequence&nbsp; $(L)&nbsp; and input sequence&nbsp;(N = 10000)$:
:$${\rm BQ1:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{6800}{10000}\hspace{0.15cm}\underline{= 0.680}\hspace{0.05cm},\\ \\
+
:$${\rm BQ1:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{6800}{10000}\hspace{0.15cm}\underline{= 0.680}\hspace{0.05cm},$$
{\rm BQ2:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{12330}{10000}\hspace{0.15cm}\underline{= 1.233}\hspace{0.05cm}. $$
+
:$$ {\rm BQ2:}\hspace{0.3cm} K \hspace{0.2cm} =  \hspace{0.2cm} \frac{12330}{10000}\hspace{0.15cm}\underline{= 1.233}\hspace{0.05cm}. $$
:Die Lempel&ndash;Ziv&ndash;Codierung macht natürlich nur bei der redundanten Binärquelle BQ1 Sinn. Hier kann die Datenmenge um 32% gesenkt werden. Bei der redundanzfreien Binärquelle BQ2 führt dagegen die LZ&ndash;Codierung zu einer um 23.3% größeren Datenmenge.
+
*Of course, the LZW coding only makes sense with the redundant binary source&nbsp; BQ1&nbsp;.&nbsp; Here the amount of data can be reduced by&nbsp; $1-0.68=32\%$&nbsp;.  
 +
*With the redundancy-free binary source&nbsp; $\rm BQ2$&nbsp;, on the other hand, the LZW coding leads to a&nbsp; $1.233-1=23.3\%$&nbsp; larger data quantity.
 +
 
  
:<b>2.</b>&nbsp;&nbsp;Aus der angegebenen Gleichung erhält man mit <i>N</i> = 10000:
 
:$${\rm BQ1:}\hspace{0.3cm} H = 0.5\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{0.5 \cdot N}{L } = 1 - \frac{5000}{6800 }  \hspace{0.15cm}\underline{\approx 26.5\,\%}\hspace{0.05cm},\\  \\
 
{\rm BQ2:}\hspace{0.3cm} H = 1.0\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{N}{L } = 1 - \frac{10000}{12330 }  \hspace{0.15cm}\underline{\approx 19\,\%}\hspace{0.05cm}.$$
 
:Die Restredundanz gibt die (relative) Redundanz der LZ&ndash;Ausgangsfolge an. Obwohl die Quelle BQ1 für die LZ&ndash;Codierung besser geeignet ist als die redundanzfreie Quelle BQ2, ergibt sich bei BQ1 wegen der Redundanz in der Eingangsfolge eine größere Restredundanz.
 
  
:Eine kleinere Restredundanz <i>r</i>(<i>N</i>) bei gegebenem <i>N</i> sagt also nichts darüber aus, ob der Einsatz von Lempel&ndash;Ziv für die vorliegende Quelle sinnvoll ist. Hierzu muss der Komprimierungsfaktor <i>K</i> betrachtet werden. Allgemein gilt folgender Zusammenhang zwischen <i>r</i>(<i>N</i>) und <i>K</i>(<i>N</i>):
+
'''(2)'''&nbsp; From the given equation we obtain with&nbsp; N=10000:
:$$r(N) = 1 - \frac{H}{K(N)}\hspace{0.05cm},\hspace{0.2cm} K(N) = H \cdot (1- r(N))
+
:$${\rm BQ1:}\hspace{0.3cm} H = 0.5\hspace{0.05cm},\hspace{0.2cm} r(N=10000) \hspace{0.2cm} =  \hspace{0.2cm}1 - \frac{0.5 \cdot N}{L } = 1 - \frac{5000}{6800 }  \hspace{0.15cm}\underline{\approx 26.5\,\%}\hspace{0.05cm},$$
 +
:BQ2:H=1.0,r(N=10000)=1NL=1100001233019%_.
 +
*The residual redundancy indicates the (relative) redundancy of the LZW output sequence.
 +
*Although the source&nbsp; BQ1&nbsp; is more suitable for LZW coding than the redundancy-free source&nbsp; BQ2, &nbsp; BQ1&nbsp; results in a larger residual redundancy because of the redundancy in the input sequence.
 +
*A smaller residual redundancy&nbsp; r(N)&nbsp; for a given&nbsp; N&nbsp; therefore says nothing about whether LZW coding is useful for the source at hand.
 +
*For this, the compression factor&nbsp; $K(N)$&nbsp; must be considered. &nbsp; In general, the following relationship between&nbsp; $r(N)&nbsp; and&nbsp;K(N)$ applies:
 +
:$$r(N) = 1 - \frac{H}{K(N)}\hspace{0.05cm},\hspace{0.5cm} K(N) = H \cdot (1- r(N))
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
  
:<b>3.</b>&nbsp;&nbsp;Aus der Tabelle auf der Angabenseite kann man ablesen (bzw. daraus ableiten)
 
  
:für die redundante Binärquelle BQ1:
+
'''(3)'''&nbsp; From the table on the information page one can read&nbsp; (or deduce)
:$$L(N = 50000) = 32100\hspace{0.05cm},\hspace{0.2cm} K(N = 50000) = 0.642\hspace{0.05cm},\hspace{0.2cm}r(N = 50000) \hspace{0.15cm}\underline {= 22.2\,\% \hspace{0.05cm}},$$
+
 
:für die redundanzfreie Binärquelle BQ2:
+
*for the redundant binary source&nbsp; $\rm BQ1$:
:$$L(N = 50000) = 59595\hspace{0.05cm},\hspace{0.2cm} K(N = 50000) = 1.192\hspace{0.05cm},\hspace{0.2cm}r(N = 50000) \hspace{0.15cm}\underline {= 16.1\,\% \hspace{0.05cm}}.$$
+
:$$L(N = 50000) = 32100\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 0.642\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 22.2\,\% \hspace{0.05cm}},$$
:Richtig sind somit die <u>Aussagen 1 und 2</u>. Für beide Quellen ist <i>K</i>(<i>N</i> = 50000) < <i>K</i>(<i>N</i> = 10000) und <i>r</i>(<i>N</i>&nbsp;=&nbsp;50000) < <i>r</i>(<i>N</i> = 10000). In beiden Fällen ergeben sich also bei größerem <i>N</i> &bdquo;günstigere&rdquo; Werte, auch dann, wenn eigentlich wie bei der redundanzfreien Binärquelle BQ2 die Anwendung von Lempel&ndash;Ziv zu einer Verschlechterung führt.
+
*for the redundancy-free binary source&nbsp; $\rm BQ2$:
 +
:$$L(N = 50000) = 59595\hspace{0.05cm},\hspace{0.5cm} K(N = 50000) = 1.192\hspace{0.05cm},\hspace{0.5cm}r(N = 50000) \hspace{0.15cm}\underline {= 16.1\,\% \hspace{0.05cm}}.$$
 +
 
 +
<u>Statements 1 and 2</u> are therefore correct:
 +
* For both sources the compression factor &nbsp;K(N)&nbsp; is smaller for &nbsp;N=50000&nbsp; than for &nbsp;N=10000.
 +
* The same applies to the residual redundancy: &nbsp; $r(N = 50000) < r(N = 10000)$.
 +
* Both with regard to &nbsp;$K(N)$&nbsp; and&nbsp; $r(N)$&nbsp; "more favourable"&nbsp; values result with larger&nbsp; N&nbsp;, even if, as with the redundancy-free binary source&nbsp; $\rm BQ2$,&nbsp; <br>the application of Lempel&ndash;Ziv actually leads to a deterioration.
  
 
{{ML-Fuß}}
 
{{ML-Fuß}}
Line 79: Line 100:
  
  
[[Category:Aufgaben zu Informationstheorie|^2.2 Verfahren von Lempel, Ziv, Welch^]]
+
[[Category:Information Theory: Exercises|^2.2 Lempel-Ziv-Welch Compression^]]

Latest revision as of 14:13, 10 August 2021

Data length L(N) after LZW coding for BQ1 and BQ2

As in  Exercise 2.5  we consider data compression using the  Lempel–Ziv–Welch algorithm. Here holds:

  • Let the input sequence have the length  N.
  • The length of the LZW coder output is  L.


The graph shows for two different binary sources  BQ1  and  BQ2  the relationship between the sequence lengths  N  and  L,  represented by the function  L(N)

The two sources have the same statistical properties as in  Exercise 2.5:

  • BQ1  is redundant due to unequal symbol probabilities  (pA=0.89  and  pB=0.11) .  There are no dependencies between the individual symbols.  The entropy is  H=0.5 bit/source symbol.
  • BQ2  is redundancy-free and thus has the entropy  H=1 bit/source symbol.


Furthermore, you need two definitions for the solution of this task:

  • Let the  "compression factor"  be by definition
K(N)=L(N)N.
  • The relative redundancy of the LZW code sequence  (called  "residual redundancy"  in the following)  is
r(N)=L(N)NHL(N)=1NHL(N).





Hints:

Remainding redundancy as a measure of the efficiency of coding methods,
Efficiency of Lempel-Ziv coding and
Quantitative statements on asymptotic optimality.


Questions

1

Which compression factors  K(N)  result with  N=10000?

BQ1:     K(N=10000) = 

BQ2:     K(N=10000) = 

2

What is the residual redundancy  r(N)  (in percent)?  Let  N=10000 apply again.

BQ1:     r(N=10000) = 

 %
BQ2:     r(N=10000) = 

 %

3

What conclusions can be drawn from the comparison of  N=10000  and  N=50000?

For both sources  K(N=50000)  is smaller than  K(N=10000).
For both sources  r(N=50000)  is smaller than  r(N=10000).
Only  BQ1  has more favourable values with  N=50000 .


Solution

(1)  The compression factor is defined as the quotient of the lengths of the LZW output sequence  (L)  and input sequence  (N=10000):

BQ1:K=680010000=0.680_,
BQ2:K=1233010000=1.233_.
  • Of course, the LZW coding only makes sense with the redundant binary source  BQ1 .  Here the amount of data can be reduced by  10.68=32% .
  • With the redundancy-free binary source  BQ2 , on the other hand, the LZW coding leads to a  1.2331=23.3%  larger data quantity.


(2)  From the given equation we obtain with  N=10000:

BQ1:H=0.5,r(N=10000)=10.5NL=15000680026.5%_,
BQ2:H=1.0,r(N=10000)=1NL=1100001233019%_.
  • The residual redundancy indicates the (relative) redundancy of the LZW output sequence.
  • Although the source  BQ1  is more suitable for LZW coding than the redundancy-free source  BQ2,   BQ1  results in a larger residual redundancy because of the redundancy in the input sequence.
  • A smaller residual redundancy  r(N)  for a given  N  therefore says nothing about whether LZW coding is useful for the source at hand.
  • For this, the compression factor  K(N)  must be considered.   In general, the following relationship between  r(N)  and  K(N) applies:
r(N)=1HK(N),K(N)=H(1r(N)).


(3)  From the table on the information page one can read  (or deduce)

  • for the redundant binary source  BQ1:
L(N=50000)=32100,K(N=50000)=0.642,r(N=50000)=22.2%_,
  • for the redundancy-free binary source  BQ2:
L(N=50000)=59595,K(N=50000)=1.192,r(N=50000)=16.1%_.

Statements 1 and 2 are therefore correct:

  • For both sources the compression factor  K(N)  is smaller for  N=50000  than for  N=10000.
  • The same applies to the residual redundancy:   r(N=50000)<r(N=10000).
  • Both with regard to  K(N)  and  r(N)  "more favourable"  values result with larger  N , even if, as with the redundancy-free binary source  BQ2
    the application of Lempel–Ziv actually leads to a deterioration.