Difference between revisions of "Aufgaben:Exercise 3.4Z: GSM Full-Rate Voice Codec"

From LNTwww

@@ Line 6: / Line 6: @@
 [[File:EN_Mob_A_3_4_Z.png|right|frame|LPC-, LTP- und RPE-Parameter beim GSM-Vollraten-Codec]]
-Dieser 1991 für das GSM–System standardisierte Codec – dieses Kunstwort steht für eine gemeinsame Realisierung von Coder und Decoder – mit der englischen Bezeichnung ''GSM Fullrate Vocoder''&nbsp;  kombiniert drei Methoden zur Kompression von Sprachsignalen:
+This codec called ''GSM Fullrate Vocoder''&nbsp; (which was standardized for the GSM system in 1991)  stands for a joint realization of coder and decoder and combines three methods for the compression of speech signals:
 *Linear Predictive Coding ('''LPC'''),
-*Long Term Prediction ('''LTP'''), und
+*Long Term Prediction ('''LTP'''), and
-*Regular Pulse Excitation ('''RPE''').
+*Regular Pulse Excitation ('''RPE''' ).
-Die in der Grafik angegebenen Zahlen geben die Bitzahl an, die von den drei Einheiten dieses FR–Sprachcodecs pro Rahmen von jeweils&nbsp;  $20$&nbsp;  Millisekunden Dauer generiert werden.
+The numbers shown in the graph indicate the number of bits generated by the three units of this FR speech codec per frame of&nbsp; $20$&nbsp; millisecond duration each.
-Anzumerken ist dabei, dass LTP und RPE im Gegensatz zu LPC nicht rahmenweise, sondern mit Unterblöcken von&nbsp;  $5$&nbsp;  Millisekunden arbeiten. Dies hat jedoch keinen Einfluss auf die Lösung der Aufgabe.
+It should be noted that LTP and RPE, unlike LPC, do not work frame by frame, but with sub-blocks of&nbsp; $5$&nbsp; milliseconds. However, this has no influence on solving the task.
-Das Eingangssignal in obiger Grafik ist das digitalisierte Sprachsignal&nbsp;  $s_{\rm R}(n)$.
+The input signal in the above graphic is the digitalized speech signal&nbsp;  $s_{\rm R}(n)$.
-Dieses entsteht aus dem analogen Sprachsignal&nbsp;  $s(t)$&nbsp;  durch
+This results from the analog speech signal&nbsp; $s(t)$&nbsp; by
-*eine geeignete Begrenzung auf die Bandbreite&nbsp;  $B$,
+*a suitable limitation to the bandwidth&nbsp; $B$,
-*Abtastung mit der Abtastrate&nbsp;  $f_{\rm A} = 8 \ \rm kHz$,
+*sampling at the sampling rate&nbsp; $f_{\rm A} = 8 \ \rm kHz$,
-*Quantisierung mit&nbsp;  $13 \ \rm  Bit$,
+*quantization with&nbsp; $13 \ \rm Bit$,
-*anschließender Segmentierung in Blöcke zu je&nbsp;  $20 \ \rm ms$.
+*following segmentation into blocks of each&nbsp;$20 \ \rm ms$.
+The further tasks of preprocessing will not be discussed in detail here.
-Auf die weiteren Aufgaben der Vorverarbeitung soll hier nicht näher eingegangen werden.
@@ Line 32: / Line 32: @@
+''Notes:''
-''Hinweise:''
+*This exercise belongs to the chapter&nbsp;   [[Mobile_Kommunikation/Gemeinsamkeiten_von_GSM_und_UMTS|Gemeinsamkeiten von GSM und
-*Diese Aufgabe gehört zum Kapitel&nbsp;   [[Mobile_Kommunikation/Gemeinsamkeiten_von_GSM_und_UMTS|Gemeinsamkeiten von GSM und
   UMTS]].
-*Bezug genommen wird auch auf das Kapitel&nbsp;  [[Beispiele_von_Nachrichtensystemen/Sprachcodierung|Sprachcodierung]]&nbsp;  des Buches „Beispiele von Nachrichtensystemen”.
+*Reference is also made to the Chapter&nbsp;  [[Beispiele_von_Nachrichtensystemen/Sprachcodierung|Sprachcodierung]]&nbsp;  des Buches „Beispiele von Nachrichtensystemen”.
-===Fragebogen===
+===Questionnaire===
 <quiz display=simple>
-{Auf welche Bandbreite&nbsp;  $B$&nbsp;  muss das Sprachsignal begrenzt werden?
+{To which bandwidth&nbsp; $B$&nbsp; must the speech signal be limited?
 |type="{}"}
 $B \ = \ $ { 4 3% } $\ \rm kHz$
-{Aus wie vielen Abtastwerten&nbsp;  $(N_{\rm R})$&nbsp;  besteht ein Sprachrahmen? Wie groß ist die Eingangsdatenrate&nbsp;  $R_{\rm In}$?
+{Of How many samples&nbsp; $(N_{\rm R})$&nbsp; is there a language frame? How large is the input data rate&nbsp; $R_{\rm In}$?
 |type="{}"}
-$N_{\rm R} \hspace{0.18cm} = \ $ { 160 3% } $\ \rm Abtastwerte$
+$N_{\rm R} \hspace{0.18cm} = \ $ { 160 3% } $\ \rm samples$
 $R_{\rm In} \hspace{0.15cm} = \ $ { 104 3% } $\ \rm kbit/s$
-{Wie groß ist die Ausgangsdatenrate&nbsp;  $R_{\rm Out}$ des GSM-Vollraten-Codecs?
+{What is the output data rate&nbsp; $R_{\rm Out}$ of the GSM full rate codec?
 |type="{}"}
 $R_{\rm Out} \ = \ $ { 13 3% } $\ \rm kbit/s$
-{Welche Aussagen treffen hinsichtlich des Blocks „LPC” zu?
+{Which statements apply to the block "LPC"?
 |type="[]"}
-+ LPC macht eine Kurzzeitprädiktion über eine Millisekunde.
++ LPC makes a short-term prediction over one millisecond.
-+ Die&nbsp;  $36$&nbsp;  LPC–Bits geben Koeffizienten an, die der Empfänger nutzt, um die LPC–Filterung rückgängig zu machen.
++ The&nbsp; $36$&nbsp; LPC bits specify coefficients that the receiver uses to undo the LPC filtering.
-- Das Filter zur Kurzzeitprädiktion ist rekursiv.
+- The filter for short-term prediction is recursive.
-- Das LPC–Ausgangssignal ist identisch mit dem Eingang&nbsp;  $s_{\rm R}(t)$.
+- The LPC output signal is identical to the input&nbsp;  $s_{\rm R}(t)$.
-{Welche Aussagen sind hinsichtlich des Blocks „LTP” zutreffend?
+{Which statements regarding the block „LTP” are true?
 |type="[]"}
-+ LTP entfernt periodische Strukturen des Sprachsignals.
++ LTP removes periodic structures of the speech signal.
-- Die Langzeitprädiktion wird pro Rahmen einmal durchgeführt.
+- The long-term prediction is performed once per frame.
-+ Das Gedächtnis des LTP–Prädiktors beträgt bis zu&nbsp;  $15 \ \rm ms$.
++ The memory of the LTP predictor is up to&nbsp;  $15 \ \rm ms$.
-{Welche Aussagen treffen für den Block „RPE” zu?
+{Which statements apply to the block "RPE"?
 |type="[]"}
-- RPE liefert weniger Bits als LPC und LTP.
+- RPE delivers fewer bits than LPC and LTP.
-+ RPE entfernt für den subjektiven Eindruck unwichtige Anteile.
++ RPE removes unimportant parts for the subjective impression.
-+ RPE unterteilt jeden Subblock nochmals in vier Teilfolgen.
++ RPE subdivides each sub-block into four sub-sequences.
-- RPE wählt davon die Teilfolge mit der minimalen Energie aus.
+- RPE selects the subsequence with the minimum energy.
 </quiz>
-===Musterlösung===
+===Sample solution===
 {{ML-Kopf}}
-'''(1)'''&nbsp; Um das Abtasttheorem zu erfüllen, darf die Bandbreite $B$ nicht größer als $ f_{\rm A}/2 \hspace{0.15cm}\underline{= 4 \ \rm kHz}$ sein.
+'''(1)'''&nbsp; To satisfy the sampling theorem, the bandwidth $B$ must not exceed $ f_{\rm A}/2 \hspace{0.15cm}\underline{= 4 \ \ \rm kHz}$.
-'''(2)'''&nbsp; Aus der gegebenen Abtastrate $f_{\rm A} = 8 \ \rm kHz$ ergibt sich ein Abstand zwischen einzelnen Samples von $T_{\rm A} = 0.125 \ \rm ms$.
+'''(2)'''&nbsp; The given sampling rate $f_{\rm A} = 8 \ \rm kHz$ results in a distance between individual samples of $T_{\rm A} = 0.125 \ \rm ms$.
-*Somit besteht ein Sprachrahmen von $(20 {\rm ms})$ aus $N_{\rm R} = 20/0.125 = \underline{160 \ \rm Abtastwerten}$, jeweils quantisiert mit $13 \ \rm Bit$.
+*Thus a speech frame of $(20 {\rm ms})$ consists of $N_{\rm R} = 20/0.125 = \underline{160 \ \rm samples}$, each quantized with $13 \ \rm Bit$.
-*Die Datenrate beträgt somit
+*The data rate is thus
 :$$R_{\rm In} = \frac{160 \cdot 13}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 104\,{\rm kbit/s}}\hspace{0.05cm}.$$
-'''(3)'''&nbsp;  Aus der Grafik ist ersichtlich, dass pro Sprachrahmen $36 \ {\rm  (LPC)} + 36 \ {\rm (LTP)} + 188 \ {\rm (RPE)} = 260 \ \rm Bit$ ausgegeben werden.
+'''(3)'''&nbsp;  The graph shows that per speech frame $36 \ {\rm (LPC)} + 36 \ {\rm (LTP)} + 188 \ {\rm (RPE)} = 260 \ \ \rm Bit$ are output.
-*Daraus berechnet sich die Ausgangsdatenrate zu
+*From this the output data rate is calculated as
 :$$R_{\rm Out} = \frac{260}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 13\,{\rm kbit/s}}\hspace{0.05cm}.$$
-*Der vom Vollraten–Sprachcodec erzielte Kompressionsfaktor ist somit $104/13 = 8$.
+*The compression factor achieved by the full rate speech codec is thus $104/13 = $8.
-'''(4)'''&nbsp; Nur die <u>beiden ersten Aussagen</u> sind zutreffend:
+'''(4)'''&nbsp; Only the <u> first two statements</u> are true:
-*Die 36 LPC&ndash;Bits beschreiben insgesamt acht Filterkoeffizienten eines nichtrekursiven Filters, wobei aus der Kurzzeitanalyse acht AKF&ndash;Werte ermittelt und diese nach der so genannten Schur-Rekursion in Reflexionsfaktoren $r_{k}$ umgerechnet werden.
+*The 36 LPC&ndash;bits describe a total of eight filter coefficients of a non-recursive filter, whereby eight acf&ndash;values are determined from the short-term analysis and where these are converted into reflection factors $r_{k}$ after the so-called Schur recursion.
-*Aus diesen werden die acht LAR&ndash;Koeffizienten nach der Funktion ${\rm ln}[(1 - r_{k})/(1 + r_{k})]$ berechnet, mit einer unterschiedlichen Anzahl an Bits quantisiert und zum Empfänger geschickt.
+*From these the eight LAR&ndash;coefficients are calculated according to the function ${\rm ln}[(1 - r_{k})/(1 + r_{k})]$, quantized with a different number of bits and sent to the receiver.
-*Das LPC–Ausgangssignal besitzt gegenüber seinem Eingang $s_{\rm R}(n)$ eine deutlich kleinere Amplitude, hat einen deutlich reduzierten Dynamikumfang und ein flacheres Spektrum.
+*The LPC output signal has a significantly lower amplitude than its input $s_{\rm R}(n)$, and it has a significantly reduced dynamic range and a flatter spectrum.
-'''(5)'''&nbsp; Richtig sind die <u>die Aussagen 1 und 3</u>, nicht jedoch die zweite:
+'''(5)'''&nbsp; Correct are the <u>the statements 1 and 3</u>, but not the second:
 *Die LTP&ndash;Analyse und &ndash;Filterung erfolgt blockweise alle $5 \ \rm ms$ (40 Abtastwerte), also viermal pro Sprachrahmen.
 *Man bildet hierzu die Kreuzkorrelationsfunktion (KKF) zwischen dem aktuellen und den drei vorangegangenen Subblöcken.

Revision as of 20:16, 28 June 2020

Return to book

LPC-, LTP- und RPE-Parameter beim GSM-Vollraten-Codec

This codec called GSM Fullrate Vocoder (which was standardized for the GSM system in 1991) stands for a joint realization of coder and decoder and combines three methods for the compression of speech signals:

Linear Predictive Coding (LPC),
Long Term Prediction (LTP), and
Regular Pulse Excitation (RPE ).

The numbers shown in the graph indicate the number of bits generated by the three units of this FR speech codec per frame of $20$ millisecond duration each.

It should be noted that LTP and RPE, unlike LPC, do not work frame by frame, but with sub-blocks of $5$ milliseconds. However, this has no influence on solving the task.

The input signal in the above graphic is the digitalized speech signal $s_{\rm R}(n)$.

This results from the analog speech signal $s(t)$ by

a suitable limitation to the bandwidth $B$,
sampling at the sampling rate $f_{\rm A} = 8 \ \rm kHz$,
quantization with $13 \ \rm Bit$,
following segmentation into blocks of each $20 \ \rm ms$.

The further tasks of preprocessing will not be discussed in detail here.

Notes:

This exercise belongs to the chapter Gemeinsamkeiten von GSM und UMTS.
Reference is also made to the Chapter Sprachcodierung des Buches „Beispiele von Nachrichtensystemen”.

Questionnaire

To which bandwidth $B$ must the speech signal be limited?

$B \ = \ $

$\ \rm kHz$

Of How many samples $(N_{\rm R})$ is there a language frame? How large is the input data rate $R_{\rm In}$?

$N_{\rm R} \hspace{0.18cm} = \ $

$\ \rm samples$

$R_{\rm In} \hspace{0.15cm} = \ $

$\ \rm kbit/s$

What is the output data rate $R_{\rm Out}$ of the GSM full rate codec?

$R_{\rm Out} \ = \ $

$\ \rm kbit/s$

Which statements apply to the block "LPC"?

	LPC makes a short-term prediction over one millisecond.
	The $36$ LPC bits specify coefficients that the receiver uses to undo the LPC filtering.
	The filter for short-term prediction is recursive.
	The LPC output signal is identical to the input $s_{\rm R}(t)$.

Which statements regarding the block „LTP” are true?

	LTP removes periodic structures of the speech signal.
	The long-term prediction is performed once per frame.
	The memory of the LTP predictor is up to $15 \ \rm ms$.

Which statements apply to the block "RPE"?

	RPE delivers fewer bits than LPC and LTP.
	RPE removes unimportant parts for the subjective impression.
	RPE subdivides each sub-block into four sub-sequences.
	RPE selects the subsequence with the minimum energy.

Sample solution

Solution

(1) To satisfy the sampling theorem, the bandwidth $B$ must not exceed $ f_{\rm A}/2 \hspace{0.15cm}\underline{= 4 \ \ \rm kHz}$.

(2) The given sampling rate $f_{\rm A} = 8 \ \rm kHz$ results in a distance between individual samples of $T_{\rm A} = 0.125 \ \rm ms$.

Thus a speech frame of $(20 {\rm ms})$ consists of $N_{\rm R} = 20/0.125 = \underline{160 \ \rm samples}$, each quantized with $13 \ \rm Bit$.
The data rate is thus

$$R_{\rm In} = \frac{160 \cdot 13}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 104\,{\rm kbit/s}}\hspace{0.05cm}.$$

(3) The graph shows that per speech frame $36 \ {\rm (LPC)} + 36 \ {\rm (LTP)} + 188 \ {\rm (RPE)} = 260 \ \ \rm Bit$ are output.

From this the output data rate is calculated as

$$R_{\rm Out} = \frac{260}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 13\,{\rm kbit/s}}\hspace{0.05cm}.$$

The compression factor achieved by the full rate speech codec is thus $104/13 = $8.

(4) Only the first two statements are true:

The 36 LPC–bits describe a total of eight filter coefficients of a non-recursive filter, whereby eight acf–values are determined from the short-term analysis and where these are converted into reflection factors $r_{k}$ after the so-called Schur recursion.
From these the eight LAR–coefficients are calculated according to the function ${\rm ln}[(1 - r_{k})/(1 + r_{k})]$, quantized with a different number of bits and sent to the receiver.
The LPC output signal has a significantly lower amplitude than its input $s_{\rm R}(n)$, and it has a significantly reduced dynamic range and a flatter spectrum.

(5) Correct are the the statements 1 and 3, but not the second:

Die LTP–Analyse und –Filterung erfolgt blockweise alle $5 \ \rm ms$ (40 Abtastwerte), also viermal pro Sprachrahmen.
Man bildet hierzu die Kreuzkorrelationsfunktion (KKF) zwischen dem aktuellen und den drei vorangegangenen Subblöcken.
Für jeden Subblock werden dabei eine LTP–Verzögerung und eine LTP–Verstärkung ermittelt, die am besten zum Subblock passen.
Berücksichtigt wird hierbei auch ein Korrektursignal der nachfolgenden Komponente „RPE”.
Bei der Langzeitprädiktion ist wie bei der LPC der Ausgang gegenüber dem Eingang redundanzvermindert.

(6) Richtig sind die Aussagen 2 und 3:

Dass die Aussage 1 falsch ist, erkennt man schon aus der Grafik auf der Angabenseite, da $188$ der $260$ Ausgabebits von der RPE stammen. Sprache wäre schon allein mit RPE (ohne LPC und LTP) verständlich.
Zur letzten Aussage: Die RPE sucht natürlich die Teilfolge mit der maximalen Energie. Die RPE–Pulse sind eine Teilfolge (13 von 40 Abtastwerte) zu je drei Bit pro Teilrahmen von $5 \ \rm ms$ und dementsprechend $12 \ \rm Bit$ pro $20 \ \rm ms$–Rahmen.
Der „RPE–Pulse” belegt somit $13 \cdot 12 = 156$ der $260$ Ausgabebits.

Genaueres zum RPE–Block finden Sie auf der Seite RPE–Codierung des Buches „Beispiele von Nachrichtensystemen”.

Retrieved from "http://en.lntwww.de/index.php?title=Aufgaben:Exercise_3.4Z:_GSM_Full-Rate_Voice_Codec&oldid=31311"

Category:

Exercises for Mobile Communications