Difference between revisions of "Aufgaben:Exercise 3.4Z: GSM Full-Rate Voice Codec"
m (Text replacement - "[File:" to "[File:") |
|||
Line 6: | Line 6: | ||
[[File:EN_Mob_A_3_4_Z.png|right|frame|LPC-, LTP- und RPE-Parameter beim GSM-Vollraten-Codec]] | [[File:EN_Mob_A_3_4_Z.png|right|frame|LPC-, LTP- und RPE-Parameter beim GSM-Vollraten-Codec]] | ||
− | + | This codec called ''GSM Fullrate Vocoder'' (which was standardized for the GSM system in 1991) stands for a joint realization of coder and decoder and combines three methods for the compression of speech signals: | |
*Linear Predictive Coding ('''LPC'''), | *Linear Predictive Coding ('''LPC'''), | ||
− | *Long Term Prediction ('''LTP'''), | + | *Long Term Prediction ('''LTP'''), and |
− | *Regular Pulse Excitation ('''RPE'''). | + | *Regular Pulse Excitation ('''RPE''' ). |
− | + | The numbers shown in the graph indicate the number of bits generated by the three units of this FR speech codec per frame of $20$ millisecond duration each. | |
− | + | It should be noted that LTP and RPE, unlike LPC, do not work frame by frame, but with sub-blocks of $5$ milliseconds. However, this has no influence on solving the task. | |
− | + | The input signal in the above graphic is the digitalized speech signal $s_{\rm R}(n)$. | |
− | + | This results from the analog speech signal $s(t)$ by | |
− | * | + | *a suitable limitation to the bandwidth $B$, |
− | * | + | *sampling at the sampling rate $f_{\rm A} = 8 \ \rm kHz$, |
− | * | + | *quantization with $13 \ \rm Bit$, |
− | * | + | *following segmentation into blocks of each $20 \ \rm ms$. |
+ | The further tasks of preprocessing will not be discussed in detail here. | ||
− | |||
Line 32: | Line 32: | ||
+ | ''Notes:'' | ||
− | + | *This exercise belongs to the chapter [[Mobile_Kommunikation/Gemeinsamkeiten_von_GSM_und_UMTS|Gemeinsamkeiten von GSM und | |
− | |||
− | * | ||
UMTS]]. | UMTS]]. | ||
− | * | + | *Reference is also made to the Chapter [[Beispiele_von_Nachrichtensystemen/Sprachcodierung|Sprachcodierung]] des Buches „Beispiele von Nachrichtensystemen”. |
− | === | + | ===Questionnaire=== |
<quiz display=simple> | <quiz display=simple> | ||
− | { | + | {To which bandwidth $B$ must the speech signal be limited? |
|type="{}"} | |type="{}"} | ||
$B \ = \ $ { 4 3% } $\ \rm kHz$ | $B \ = \ $ { 4 3% } $\ \rm kHz$ | ||
− | { | + | {Of How many samples $(N_{\rm R})$ is there a language frame? How large is the input data rate $R_{\rm In}$? |
|type="{}"} | |type="{}"} | ||
− | $N_{\rm R} \hspace{0.18cm} = \ $ { 160 3% } $\ \rm | + | $N_{\rm R} \hspace{0.18cm} = \ $ { 160 3% } $\ \rm samples$ |
$R_{\rm In} \hspace{0.15cm} = \ $ { 104 3% } $\ \rm kbit/s$ | $R_{\rm In} \hspace{0.15cm} = \ $ { 104 3% } $\ \rm kbit/s$ | ||
− | { | + | {What is the output data rate $R_{\rm Out}$ of the GSM full rate codec? |
|type="{}"} | |type="{}"} | ||
$R_{\rm Out} \ = \ $ { 13 3% } $\ \rm kbit/s$ | $R_{\rm Out} \ = \ $ { 13 3% } $\ \rm kbit/s$ | ||
− | { | + | {Which statements apply to the block "LPC"? |
|type="[]"} | |type="[]"} | ||
− | + LPC | + | + LPC makes a short-term prediction over one millisecond. |
− | + | + | + The $36$ LPC bits specify coefficients that the receiver uses to undo the LPC filtering. |
− | - | + | - The filter for short-term prediction is recursive. |
− | - | + | - The LPC output signal is identical to the input $s_{\rm R}(t)$. |
− | { | + | {Which statements regarding the block „LTP” are true? |
|type="[]"} | |type="[]"} | ||
− | + LTP | + | + LTP removes periodic structures of the speech signal. |
− | - | + | - The long-term prediction is performed once per frame. |
− | + | + | + The memory of the LTP predictor is up to $15 \ \rm ms$. |
− | { | + | {Which statements apply to the block "RPE"? |
|type="[]"} | |type="[]"} | ||
− | - RPE | + | - RPE delivers fewer bits than LPC and LTP. |
− | + RPE | + | + RPE removes unimportant parts for the subjective impression. |
− | + RPE | + | + RPE subdivides each sub-block into four sub-sequences. |
− | - RPE | + | - RPE selects the subsequence with the minimum energy. |
</quiz> | </quiz> | ||
− | === | + | ===Sample solution=== |
{{ML-Kopf}} | {{ML-Kopf}} | ||
− | '''(1)''' | + | '''(1)''' To satisfy the sampling theorem, the bandwidth $B$ must not exceed $ f_{\rm A}/2 \hspace{0.15cm}\underline{= 4 \ \ \rm kHz}$. |
− | '''(2)''' | + | '''(2)''' The given sampling rate $f_{\rm A} = 8 \ \rm kHz$ results in a distance between individual samples of $T_{\rm A} = 0.125 \ \rm ms$. |
− | * | + | *Thus a speech frame of $(20 {\rm ms})$ consists of $N_{\rm R} = 20/0.125 = \underline{160 \ \rm samples}$, each quantized with $13 \ \rm Bit$. |
− | * | + | *The data rate is thus |
:$$R_{\rm In} = \frac{160 \cdot 13}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 104\,{\rm kbit/s}}\hspace{0.05cm}.$$ | :$$R_{\rm In} = \frac{160 \cdot 13}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 104\,{\rm kbit/s}}\hspace{0.05cm}.$$ | ||
− | '''(3)''' | + | '''(3)''' The graph shows that per speech frame $36 \ {\rm (LPC)} + 36 \ {\rm (LTP)} + 188 \ {\rm (RPE)} = 260 \ \ \rm Bit$ are output. |
− | * | + | *From this the output data rate is calculated as |
:$$R_{\rm Out} = \frac{260}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 13\,{\rm kbit/s}}\hspace{0.05cm}.$$ | :$$R_{\rm Out} = \frac{260}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 13\,{\rm kbit/s}}\hspace{0.05cm}.$$ | ||
− | * | + | *The compression factor achieved by the full rate speech codec is thus $104/13 = $8. |
− | '''(4)''' | + | '''(4)''' Only the <u> first two statements</u> are true: |
− | * | + | *The 36 LPC–bits describe a total of eight filter coefficients of a non-recursive filter, whereby eight acf–values are determined from the short-term analysis and where these are converted into reflection factors $r_{k}$ after the so-called Schur recursion. |
− | * | + | *From these the eight LAR–coefficients are calculated according to the function ${\rm ln}[(1 - r_{k})/(1 + r_{k})]$, quantized with a different number of bits and sent to the receiver. |
− | * | + | *The LPC output signal has a significantly lower amplitude than its input $s_{\rm R}(n)$, and it has a significantly reduced dynamic range and a flatter spectrum. |
− | '''(5)''' | + | '''(5)''' Correct are the <u>the statements 1 and 3</u>, but not the second: |
*Die LTP–Analyse und –Filterung erfolgt blockweise alle $5 \ \rm ms$ (40 Abtastwerte), also viermal pro Sprachrahmen. | *Die LTP–Analyse und –Filterung erfolgt blockweise alle $5 \ \rm ms$ (40 Abtastwerte), also viermal pro Sprachrahmen. | ||
*Man bildet hierzu die Kreuzkorrelationsfunktion (KKF) zwischen dem aktuellen und den drei vorangegangenen Subblöcken. | *Man bildet hierzu die Kreuzkorrelationsfunktion (KKF) zwischen dem aktuellen und den drei vorangegangenen Subblöcken. |
Revision as of 19:16, 28 June 2020
This codec called GSM Fullrate Vocoder (which was standardized for the GSM system in 1991) stands for a joint realization of coder and decoder and combines three methods for the compression of speech signals:
- Linear Predictive Coding (LPC),
- Long Term Prediction (LTP), and
- Regular Pulse Excitation (RPE ).
The numbers shown in the graph indicate the number of bits generated by the three units of this FR speech codec per frame of $20$ millisecond duration each.
It should be noted that LTP and RPE, unlike LPC, do not work frame by frame, but with sub-blocks of $5$ milliseconds. However, this has no influence on solving the task.
The input signal in the above graphic is the digitalized speech signal $s_{\rm R}(n)$.
This results from the analog speech signal $s(t)$ by
- a suitable limitation to the bandwidth $B$,
- sampling at the sampling rate $f_{\rm A} = 8 \ \rm kHz$,
- quantization with $13 \ \rm Bit$,
- following segmentation into blocks of each $20 \ \rm ms$.
The further tasks of preprocessing will not be discussed in detail here.
Notes:
- This exercise belongs to the chapter Gemeinsamkeiten von GSM und UMTS.
- Reference is also made to the Chapter Sprachcodierung des Buches „Beispiele von Nachrichtensystemen”.
Questionnaire
Sample solution
(1) To satisfy the sampling theorem, the bandwidth $B$ must not exceed $ f_{\rm A}/2 \hspace{0.15cm}\underline{= 4 \ \ \rm kHz}$.
(2) The given sampling rate $f_{\rm A} = 8 \ \rm kHz$ results in a distance between individual samples of $T_{\rm A} = 0.125 \ \rm ms$.
- Thus a speech frame of $(20 {\rm ms})$ consists of $N_{\rm R} = 20/0.125 = \underline{160 \ \rm samples}$, each quantized with $13 \ \rm Bit$.
- The data rate is thus
- $$R_{\rm In} = \frac{160 \cdot 13}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 104\,{\rm kbit/s}}\hspace{0.05cm}.$$
(3) The graph shows that per speech frame $36 \ {\rm (LPC)} + 36 \ {\rm (LTP)} + 188 \ {\rm (RPE)} = 260 \ \ \rm Bit$ are output.
- From this the output data rate is calculated as
- $$R_{\rm Out} = \frac{260}{20 \,{\rm ms}} \hspace{0.15cm} \underline {= 13\,{\rm kbit/s}}\hspace{0.05cm}.$$
- The compression factor achieved by the full rate speech codec is thus $104/13 = $8.
(4) Only the first two statements are true:
- The 36 LPC–bits describe a total of eight filter coefficients of a non-recursive filter, whereby eight acf–values are determined from the short-term analysis and where these are converted into reflection factors $r_{k}$ after the so-called Schur recursion.
- From these the eight LAR–coefficients are calculated according to the function ${\rm ln}[(1 - r_{k})/(1 + r_{k})]$, quantized with a different number of bits and sent to the receiver.
- The LPC output signal has a significantly lower amplitude than its input $s_{\rm R}(n)$, and it has a significantly reduced dynamic range and a flatter spectrum.
(5) Correct are the the statements 1 and 3, but not the second:
- Die LTP–Analyse und –Filterung erfolgt blockweise alle $5 \ \rm ms$ (40 Abtastwerte), also viermal pro Sprachrahmen.
- Man bildet hierzu die Kreuzkorrelationsfunktion (KKF) zwischen dem aktuellen und den drei vorangegangenen Subblöcken.
- Für jeden Subblock werden dabei eine LTP–Verzögerung und eine LTP–Verstärkung ermittelt, die am besten zum Subblock passen.
- Berücksichtigt wird hierbei auch ein Korrektursignal der nachfolgenden Komponente „RPE”.
- Bei der Langzeitprädiktion ist wie bei der LPC der Ausgang gegenüber dem Eingang redundanzvermindert.
(6) Richtig sind die Aussagen 2 und 3:
- Dass die Aussage 1 falsch ist, erkennt man schon aus der Grafik auf der Angabenseite, da $188$ der $260$ Ausgabebits von der RPE stammen. Sprache wäre schon allein mit RPE (ohne LPC und LTP) verständlich.
- Zur letzten Aussage: Die RPE sucht natürlich die Teilfolge mit der maximalen Energie. Die RPE–Pulse sind eine Teilfolge (13 von 40 Abtastwerte) zu je drei Bit pro Teilrahmen von $5 \ \rm ms$ und dementsprechend $12 \ \rm Bit$ pro $20 \ \rm ms$–Rahmen.
- Der „RPE–Pulse” belegt somit $13 \cdot 12 = 156$ der $260$ Ausgabebits.
Genaueres zum RPE–Block finden Sie auf der Seite RPE–Codierung des Buches „Beispiele von Nachrichtensystemen”.