Exercise 3.5: GSM Full Rate Vocoder

From LNTwww
Revision as of 21:36, 21 January 2023 by Noah (talk | contribs)

LPC, LTP and RPE parameters in the GSM Full-Rate Vocoder

This codec called "GSM Full-Rate Vocoder"  (which was standardized for the GSM system in 1991)  stands for a joint realization of coder and decoder and combines three methods for the compression of speech signals:

  • Linear Predictive Coding  $\rm (LPC)$,
  • Long Term Prediction  $\rm (LTP)$, and
  • Regular Pulse Excitation  $\rm (RPE)$.


The numbers shown in the graphic indicate the number of bits generated by the three units of this Full-Rate speech codec per frame of  $20$  millisecond duration each.

It should be noted that LTP and RPE, unlike LPC, do not work frame by frame, but with sub-blocks of  $5$  milliseconds.  However, this has no influence on solving the task.

The input signal in the above graphic is the digitalized speech signal  $s_{\rm R}(n)$.

This results from the analog speech signal  $s(t)$  by

  • a suitable limitation to the bandwidth $B$,
  • sampling at the sampling rate $f_{\rm A} = 8 \ \rm kHz$,
  • quantization with $13 \ \rm bit$,
  • following segmentation into blocks of each $20 \ \rm ms$.


The further tasks of preprocessing will not be discussed in detail here.

You can also take questions and solutions (almost completely?) from " Exercise 3.4Z"


Hint:



Questions

1

To what bandwidth must the speech signal be limited?

$B \ = \ $

$ \ \rm kHz$

2

How many samples  $(N_{\rm R})$  does a speech frame consist of? What is the input data rate  $R_{\rm In}$?

$N_{\rm R} \hspace{0.25cm} = \ $

$R_{\rm In} \hspace{0.22cm} = \ $

$\ \rm kbit/s$

3

What is the output data rate  $R_{\rm Out}$  of the GSM full rate codec?

$R_{\rm Out} \hspace{0.09cm} = \ $

$ \ \rm kbit/s$

4

Which statements are true regarding the block "LPC"?

LPC makes a short-term prediction over one millisecond.
The  $36$  LPC bits are filter coefficients used at the receiver to undo the LPC filtering.
The filter for long-term prediction is recursive.
The LPC output is identical to its input  $s_{\rm R}(t)$.

5

Which statements are true regarding the block "LTP"?

Periodic structures of the speech signal are removed.
Long-term prediction is performed once per frame.
The memory of the LTP predictor is up to  $15 \ \rm ms$.

6

Which statements are true for the block "RPE"?

RPE provides less information than LPC and LTP.
RPE removes parts that are unimportant for the subjective impression.
RPE divides each subblock again into four sub-sequences.
RPE selects the subsequence with the minimum energy.


Solution

(1)  To satisfy the sampling theorem, the bandwidth must not exceed $f_{\rm A}/2 \hspace{0.15cm} \underline{= 4 \ \rm kHz}$.


(2)  From the given sampling rate $f_{\rm A} = 8 \ \rm kHz$ results a distance between individual samples of $T_{\rm A} = 0.125 \ \rm ms$.

  • Thus, a speech frame $(20 \ \rm ms)$ consists of $N_{\rm R} = 20/0.125\hspace{0.15cm} \underline{= 160 \ \rm samples}$, each quantised with $13 \ \rm bits$.
  • The data rate is thus
$$R_{\rm In} = \frac{160 \cdot 13}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 104\,{\rm kbit/s}}\hspace{0.05cm}.$$


(3)  From the graph, it can be seen that $36$ (LPC) $+ 36$ (LTP) $+ 188$ (RPE) $= 260 \ \rm bits$ are output per speech frame.

  • From this, the output data rate is calculated to be
$$R_{\rm Out} = \frac{260}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 13\,{\rm kbit/s}}\hspace{0.05cm}.$$
  • The compression factor achieved by the full rate speech codec is thus $104/13 = 8$.


(4)  Correct are statements 1 and 2:

  • The $36$ LPC bits describe a total of eight filter coefficients of a non-recursive filter, where eight AKF values are determined from the short-time analysis and these are converted into reflection coefficients $r_{k}$ according to the so-called Schur recursion.
  • From these, the eight LAR coefficients are calculated according to the function ${\rm ln}[(1 - r_{k})/(1 + r_{k})]$, quantised with a different number of bits and passed on to the receiver.
  • The LPC output signal has a significantly smaller amplitude compared to its input $s_{\rm R}(n)$, has a significantly reduced dynamic range and a flatter spectrum.


(5)  Correct are statements 1 and 3, but not the second:

  • The LTP analysis and filtering is done in blocks every $5 \rm ms \ (40 \rm samples)$, i.e. four times per speech frame.
  • To do this, the cross-correlation function (CCF) is formed between the current and the three preceding sub-blocks.
  • For each sub-block, an LTP delay and an LTP gain are determined that best fit the sub-block.
  • A correction signal of the subsequent component "RPE" is also taken into account.
  • In the case of long-term prediction, as with LPC, the output is redundancy-reduced compared to the input.


(6)  Correct are statements 2 and 3:

  • That statement 1 is false can already be seen from the graph on the statements page, since $188$ of the $260$ output bits come from the RPE.
  • To the last statement:  The RPE searches for the subsequence with the maximum energy.
  • This parameter "RPE pulses" alone occupies $156$ of the $260$ output bits.