Exercise 3.6: Adaptive Multi Rate Codec
In the late 1990s, a very flexible, adaptive speech codec was developed and standardised in the form of the AMR codec. This provides a total of eight different modes with data rates between $4.75 \ \rm kbit/s$ and $12.2 \ \rm kbit/s$ .
The AMR codec, like the full-rate codec (FRC) discussed in "Exercise 3.5" , includes both a short-term prediction (LPC) and a long-term prediction (LTP). However, these two components are realised differently from FRC.
The main difference between AMR and FRC is the coding of the residual signal (after LPC and LTP):
- Instead of "Regular Pulse Excitation" (RPE), the "Algebraic Code Excitation Linear Prediction" (ACELP) procedure is used for the AMR code.
- From the fixed codebook (FCB), for each subframe of $5 \ \rm ms$ duration, the FCB pulse and FCB gain that best match the residual signal (for which the mean square error of the difference signal becomes minimum) is selected.
Each entry in the fixed codebook identifies a pulse where exactly $10$ of $40$ positions are occupied by $\pm1$ . In this regard it should be noted:
- The pulse is divided into five tracks with eight possible positions each, where track $1$ contains the positions $1,\ 6,\ 11$, ... , $36$ of the subframe and track $5$ the positions $5,\ 10,\ 15$, ... , $40$ describes.
- In each track there are exactly two values $\pm1$, while all the other six values are $0$ . The two $±1$-positions are each assigned three bits - i.e. with $000$, ... , $111$ - are coded.
- Another bit is used for the sign of the first-mentioned pulse, where a "$1$" indicates a positive sign and a "$0$" a negative sign.
- If the pulse position of the second pulse is greater than that of the first pulse, the second pulse has the same sign as the first, otherwise the opposite.
- Thus, seven bits per track are transmitted to the receiver, plus five bits for the so-called FCB amplification.
In the diagram, the $35$ bits describing an FCB pulse are given as an example.
Track 1 includes.
- a positive pulse $({\rm VZ} = 1)$ at $1$ (first possible position for track 1) $\hspace{0.2cm}\text{plus}\hspace{0.2cm}0$ (bit specification for " 000") $= 1$,
- another positive pulse (since $110 > 000$) at position $1 \hspace{0.2cm}\text{plus}\hspace{0.2cm}5$ (pulse spacing in each track) $\hspace{0.2cm}\text{mal}\hspace{0.2cm}6$ (bit specification for " 110") = $31\hspace{0.05cm}.$
Track 2 includes.
- a negative pulse (${\rm VZ} = 0$) at $2$ (first possible position for track 2) $\hspace{0.2cm}\text{plus}\hspace{0.2cm}5\hspace{0.2cm}\text{mal}\hspace{0.2cm}4$ (bit specification for " 100") = $22\hspace{0.05cm},$
- a positive pulse (sign reversal due to $011 > 100$) at position $2 \hspace{0.2cm}\text{plus}\hspace{0.2cm}5\hspace{0.2cm}\text{mal}\hspace{0.2cm}3$ (bit specification for " 011") = $17\hspace{0.05cm}.$
Hint:
- This exercise belongs to the chapter "Voice Coding".
- When entering the pulse positions $N_{1}$ denotes the first triple of bits and $N_{2}$ the second.
- For example, for track $2$ one would have to enter the values $N_{1}=-22$ and $N_{2}=+17$ .
Questions
Musterlösung
(1) Mit der Datenrate $12.2 \ \rm kbit/s$ ergeben sich innerhalb von $20 \ \rm ms$ genau $\underline{244 \ \rm Bit}$, während zum Beispiel im $4.75 \ \rm kbit/s$–Modus nur $95 \ \rm Bit$ übertragen werden.
(2) In jedem Unterrahmen benötigt der FCB–Puls $35 \ \rm Bit$ (fünf Spuren zu je sieben Bit) und die FCB–Verstärkung fünf Bit.
- Bei vier Unterrahmen kommt man so auf $N_{\rm FCB} \underline{= 160 \ \rm Bit}$.
(3) Hierfür verbleiben die Differenz aus (1) und (2), also $N_{\rm LPC/LTP}\underline{ = 84 \ \rm Bit}$.
(4) Das Vorzeichenbit "$0$" deutet auf einen negativen ersten Impuls hin.
- Wegen $001 < 011$ hat der zweite Impuls das gleiche Vorzeichen.
- Die beiden Beträge ergeben sich zu
- $$|N_1| \ = \ 3 \hspace{0.1cm}{\rm(da \hspace{0.1cm} Spur \hspace{0.1cm}3)} + 5\cdot 1 \hspace{0.1cm} {\rm(Bitangabe \hspace{0.1cm} 001)} = 8\hspace{0.05cm}, $$
- $$ |N_2| \ = \ 3 \hspace{0.1cm}{\rm(da \hspace{0.1cm} Spur \hspace{0.1cm}3)} + 5\cdot 3 \hspace{0.1cm} {\rm(Bitangabe \hspace{0.1cm} 011)} = 18\hspace{0.05cm}.$$
- Einzugeben sind deshalb für die dritte Spur $N_{1} \underline{ = -8}$ und $N_{2} \underline{ = -18}.$
(5) In analoger Weise erhält man für die Spur $4$ die Werte $N_{1} \underline{ = +39}$ und $N_{2} \underline{ = -14}$.
(6) Die fünfte Spur liefert $N_{1} \underline{ =-30}$ und $N_{2} \underline{ = +5}$