Exercise 3.4Z: GSM Full-Rate Voice Codec

LPC-, LTP- und RPE-Parameter beim GSM-Vollraten-Codec

This codec called GSM Fullrate Vocoder (which was standardized for the GSM system in 1991) stands for a joint realization of coder and decoder and combines three methods for the compression of speech signals:

Linear Predictive Coding (LPC),
Long Term Prediction (LTP), and
Regular Pulse Excitation (RPE ).

The numbers shown in the graph indicate the number of bits generated by the three units of this FR speech codec per frame of $20$ millisecond duration each.

It should be noted that LTP and RPE, unlike LPC, do not work frame by frame, but with sub-blocks of $5$ milliseconds. However, this has no influence on solving the task.

The input signal in the above graphic is the digitalized speech signal $s_{\rm R}(n)$.

This results from the analog speech signal $s(t)$ by

a suitable limitation to the bandwidth $B$,
sampling at the sampling rate $f_{\rm A} = 8 \ \rm kHz$,
quantization with $13 \ \rm Bit$,
following segmentation into blocks of each $20 \ \rm ms$.

The further tasks of preprocessing will not be discussed in detail here.

Notes:

This exercise belongs to the chapter Gemeinsamkeiten von GSM und UMTS.
Reference is also made to the Chapter Sprachcodierung des Buches „Beispiele von Nachrichtensystemen”.

Questionnaire

Sample solution

Solution

(1) To satisfy the sampling theorem, the bandwidth $B$ must not exceed $ f_{\rm A}/2 \hspace{0.15cm}\underline{= 4 \ \ \rm kHz}$.

(2) The given sampling rate $f_{\rm A} = 8 \ \rm kHz$ results in a distance between individual samples of $T_{\rm A} = 0.125 \ \rm ms$.

Thus a speech frame of $(20 {\rm ms})$ consists of $N_{\rm R} = 20/0.125 = \underline{160 \ \rm samples}$, each quantized with $13 \ \rm Bit$.
The data rate is thus

$$R_{\rm In} = \frac{160 \cdot 13}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 104\,{\rm kbit/s}}\hspace{0.05cm}.$$

(3) The graph shows that per speech frame $36 \ {\rm (LPC)} + 36 \ {\rm (LTP)} + 188 \ {\rm (RPE)} = 260 \ \ \rm Bit$ are output.

From this the output data rate is calculated as

$$R_{\rm Out} = \frac{260}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 13\,{\rm kbit/s}}\hspace{0.05cm}.$$

The compression factor achieved by the full rate speech codec is thus $104/13 = $8.

(4) Only the first two statements are true:

The 36 LPC–bits describe a total of eight filter coefficients of a non-recursive filter, whereby eight acf–values are determined from the short-term analysis and where these are converted into reflection factors $r_{k}$ after the so-called Schur recursion.
From these the eight LAR–coefficients are calculated according to the function ${\rm ln}[(1 - r_{k})/(1 + r_{k})]$, quantized with a different number of bits and sent to the receiver.
The LPC output signal has a significantly lower amplitude than its input $s_{\rm R}(n)$, and it has a significantly reduced dynamic range and a flatter spectrum.

(5) Correct are the the statements 1 and 3, but not the second:

The LTP–analysis and –filtering is done blockwise every $5 \ \rm ms$ (40 samples), i.e. four times per speech frame.
The cross correlation function (CCF) between the current sub-block and the three previous sub-blocks is formed.
For each sub-block, an LTP–delay and an LTP–gain are determined which best match the sub-block.
A correction signal of the following component „RPE” is also taken into account.
For the long-term prediction, as with the LPC, the output is reduced in redundancy compared to the input.

(6) The statements 2 and 3 are correct:

The fact that statement 1 is wrong can be seen from the graphic on the data page, because $188$ of the $260$ output bits come from the RPE. Language would be understandable with RPE alone (without LPC and LTP).
Regarding the last statement: The RPE is of course looking for the subsequence with the 'maximum energy. The RPE pulses are a subsequence (13 of 40 samples) of three bits per subframe of $5 \ \rm ms$ and accordingly $12 \ \rm Bit$ per $20 \ \rm ms$ frame.
The "RPE pulse" thus occupies $13 \cdot 12 = 156$ of the $260$ output bits.

More details about the RPE block can be found on the page RPE–Codierung des Buches „Beispiele von Nachrichtensystemen”.

	LPC makes a short-term prediction over one millisecond.
	The $36$ LPC bits specify coefficients that the receiver uses to undo the LPC filtering.
	The filter for short-term prediction is recursive.
	The LPC output signal is identical to the input $s_{\rm R}(t)$.

	RPE delivers fewer bits than LPC and LTP.
	RPE removes unimportant parts for the subjective impression.
	RPE subdivides each sub-block into four sub-sequences.
	RPE selects the subsequence with the minimum energy.

	LTP removes periodic structures of the speech signal.
	The long-term prediction is performed once per frame.
	The memory of the LTP predictor is up to $15 \ \rm ms$.