Difference between revisions of "Aufgaben:Exercise 2.2Z: Average Code Word Length"

From LNTwww
m (Text replacement - "discrete-value" to "discrete")
 
Line 18: Line 18:
  
  
A measure for the quality of a compression method is the average codeword length  $L_{\rm M}$   with the additional unit  "bit/source symbol".  
+
A measure for the quality of a compression method is the average code word length  $L_{\rm M}$   with the additional unit  "bit/source symbol".  
  
  
Line 37: Line 37:
  
 
<quiz display=simple>
 
<quiz display=simple>
{Determine the average codeword length&nbsp; $L_{\rm M}$&nbsp; for&nbsp; $p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} =  1/4$.
+
{Determine the average code word length&nbsp; $L_{\rm M}$&nbsp; for&nbsp; $p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} =  1/4$.
 
|type="{}"}
 
|type="{}"}
 
$\text{C1:}\ \ L_{\rm M} \ = \ $  { 2 1% } $\ \rm bit/source\hspace{0.15cm} symbol$
 
$\text{C1:}\ \ L_{\rm M} \ = \ $  { 2 1% } $\ \rm bit/source\hspace{0.15cm} symbol$
Line 53: Line 53:
 
{How can you recognise prefix-free codes?
 
{How can you recognise prefix-free codes?
 
|type="[]"}
 
|type="[]"}
+ No codeword is the beginning of another codeword.
+
+ No code word is the beginning of another code word.
- All codewords have the same length.
+
- All code words have the same length.
  
  
Line 75: Line 75:
 
===Solution===
 
===Solution===
 
{{ML-Kopf}}
 
{{ML-Kopf}}
'''(1)'''&nbsp; The average codeword length is generally given by
+
'''(1)'''&nbsp; The average code word length is generally given by
 
:$$L_{\rm M} = p_{\rm A} \cdot L_{\rm A} + p_{\rm B} \cdot L_{\rm B}+ p_{\rm C} \cdot L_{\rm C} + p_{\rm D} \cdot L_{\rm D}
 
:$$L_{\rm M} = p_{\rm A} \cdot L_{\rm A} + p_{\rm B} \cdot L_{\rm B}+ p_{\rm C} \cdot L_{\rm C} + p_{\rm D} \cdot L_{\rm D}
 
\hspace{0.05cm}.$$
 
\hspace{0.05cm}.$$
Line 87: Line 87:
  
  
'''(2)'''&nbsp; With the code table&nbsp; $\text{C1}$&nbsp;, the average codeword length&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2}\ \rm bit/source\hspace{0.15cm} symbol$&nbsp; is always obtained, independent of the symbol probabilities.
+
'''(2)'''&nbsp; With the code table&nbsp; $\text{C1}$&nbsp;, the average code word length&nbsp; $L_{\rm M} \hspace{0.15cm}\underline{= 2}\ \rm bit/source\hspace{0.15cm} symbol$&nbsp; is always obtained, independent of the symbol probabilities.
  
 
For the other two codes one obtains:
 
For the other two codes one obtains:
Line 96: Line 96:
 
From the example you can see the principle:
 
From the example you can see the principle:
 
*Probable symbols are represented by a few binary symbols, improbable ones by more.
 
*Probable symbols are represented by a few binary symbols, improbable ones by more.
*In the case of equally probable symbols, it is best to choose the same codeword lengths.
+
*In the case of equally probable symbols, it is best to choose the same code word lengths.
  
  
Line 102: Line 102:
  
 
'''(3)'''&nbsp; <u>Solution suggestion 1</u> is correct:  
 
'''(3)'''&nbsp; <u>Solution suggestion 1</u> is correct:  
*The code&nbsp; $\text{C1}$&nbsp; with uniform length of all codewords is prefix-free,
+
*The code&nbsp; $\text{C1}$&nbsp; with uniform length of all code words is prefix-free,
 
*but other codes can also be prefix-free, for example the codes&nbsp;  $\text{C2}$&nbsp; and&nbsp;  $\text{C3}$.
 
*but other codes can also be prefix-free, for example the codes&nbsp;  $\text{C2}$&nbsp; and&nbsp;  $\text{C3}$.
  

Latest revision as of 15:53, 1 November 2022

Three source coding tables

The aim of data compression is to represent the message of a source with as few binary characters as possible.

We consider here a discrete message source with the symbol set  $\rm \{ A, \ B, \ C, \ D\}$   ⇒   symbol set size  $M = 4$  and the symbol probabilities

  • $p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} = 1/4$  $($subtask  $1)$,
  • $p_{\rm A} = 1/2, \, p_{\rm B} = 1/4, \, p_{\rm C} = p_{\rm D} = 1/8$  $($subtask  $2$  to  $5)$.


It is assumed that there are no statistical Dependencies between the individual source symbols.

Three assignments are given. To be noted:

  • Each of these binary codes  $\rm C1$,  $\rm C2$  and  $\rm C3$  is designed for a specific source statistic.
  • All codes are prefix-free and thus immediately decodable without further specification.


A measure for the quality of a compression method is the average code word length  $L_{\rm M}$  with the additional unit  "bit/source symbol".





Hint:



Questions

1

Determine the average code word length  $L_{\rm M}$  for  $p_{\rm A} = p_{\rm B} = p_{\rm C} = p_{\rm D} = 1/4$.

$\text{C1:}\ \ L_{\rm M} \ = \ $

$\ \rm bit/source\hspace{0.15cm} symbol$
$\text{C2:}\ \ L_{\rm M} \ = \ $

$\ \rm bit/source\hspace{0.15cm} symbol$
$\text{C3:}\ \ L_{\rm M} \ = \ $

$\ \rm bit/source\hspace{0.15cm} symbol$

2

Which values result for  $p_{\rm A} = 1/2, \, p_{\rm B} = 1/4, \, p_{\rm C} = p_{\rm D} = 1/8$?

$\text{C1:}\ \ L_{\rm M} \ = \ $

$\ \rm bit/source\hspace{0.15cm} symbol$
$\text{C2:}\ \ L_{\rm M} \ = \ $

$\ \rm bit/source\hspace{0.15cm} symbol$
$\text{C3:}\ \ L_{\rm M} \ = \ $

$\ \rm bit/source\hspace{0.15cm} symbol$

3

How can you recognise prefix-free codes?

No code word is the beginning of another code word.
All code words have the same length.

4

For the special source symbol sequence  $\rm ADBDCBCBADCA$ , the encoded sequence  $\rm 001101111001100100111000$  results.
Which code was used?

The code  $\rm C1$,
the code  $\rm C2$.

5

After coding with  $\rm C3$,  you get  $\rm 001101111001100100111000$.  What is the corresponding source symbol sequence?

$\rm AACDBACABADAAA$ ...
$\rm ACBCCCACAACCD$ ...


Solution

(1)  The average code word length is generally given by

$$L_{\rm M} = p_{\rm A} \cdot L_{\rm A} + p_{\rm B} \cdot L_{\rm B}+ p_{\rm C} \cdot L_{\rm C} + p_{\rm D} \cdot L_{\rm D} \hspace{0.05cm}.$$

If the four source symbols are equally probable  $($all probabilities exactly  $1/4)$, then for this we can also write:

$$L_{\rm M} = 1/4 \cdot ( L_{\rm A} + L_{\rm B}+ L_{\rm C} + L_{\rm D}) \hspace{0.05cm}.$$
  • $\text{Code C1:}$    $L_{\rm M} \hspace{0.15cm}\underline{= 2.00}\ \rm bit/source\hspace{0.15cm} symbol$,
  • $\text{Code C2:}$    $L_{\rm M} \hspace{0.15cm}\underline{= 2.25}\ \rm bit/source\hspace{0.15cm} symbol$
  • $\text{Code C3:}$    $L_{\rm M} \hspace{0.15cm}\underline{= 2.25}\ \rm bit/source\hspace{0.15cm} symbol$.


(2)  With the code table  $\text{C1}$ , the average code word length  $L_{\rm M} \hspace{0.15cm}\underline{= 2}\ \rm bit/source\hspace{0.15cm} symbol$  is always obtained, independent of the symbol probabilities.

For the other two codes one obtains:

  • $\text{Code C2:}$    $L_{\rm M} = 1/2 \cdot 1 + 1/4 \cdot 2 + 1/8 \cdot 3 + 1/8 \cdot 3 \hspace{0.15cm}\underline{= 1.75}\ \rm bit/source\hspace{0.15cm} symbol$,
  • $\text{Code C3:}$    $L_{\rm M} = 1/2 \cdot 3 + 1/4 \cdot 2 + 1/8 \cdot 1 + 1/8 \cdot 3 \hspace{0.15cm}\underline{= 2.50}\ \rm bit/source\hspace{0.15cm} symbol$.


From the example you can see the principle:

  • Probable symbols are represented by a few binary symbols, improbable ones by more.
  • In the case of equally probable symbols, it is best to choose the same code word lengths.



(3)  Solution suggestion 1 is correct:

  • The code  $\text{C1}$  with uniform length of all code words is prefix-free,
  • but other codes can also be prefix-free, for example the codes  $\text{C2}$  and  $\text{C3}$.


(4)  Solution suggestion 1 is correct:

  • Already from  "00"  at the beginning one can see that the code  $\text{C2}$  is out of the question here,
    because otherwise the source symbol sequence would have to begin with  "AA".
  • In fact, the code  $\text{C1}$  was used.


(5)  Solution suggestion 2 is correct.

The first suggested solution gives the source symbol sequence for code  $\text{C2}$  if the encoded sequence would be   "$\rm 001101111001100100111000$".