Speech and data transmission components


On the right you can see the block diagram of the GSM transmission system at the transmitting end,  which is

  • suitable for both digitized speech signals  $($sampling rate:  $8 \ \rm kHz$,  quantization:  $13$ bit   ⇒   data rate:  $104 \ \rm kbit/s)$
  • as well as being suitable for  $9.6 \ \rm kbit/s$  data signals.


Here is a brief description of each component:

Components of the speech and data communication with GSM  –   components for speech are shown in blue,  those for data in red, and common blocks in green






  1. Speech signals are compressed by speech coding by factor  $8$   ⇒   from  $104 \ \rm kbit/s$  to  $13 \ \rm kbit/s$.  The bit rate given in the graph is for the full rate codec,  which delivers exactly  $260$  bits per speech frame  $($duration  $T_{\rm R} = 20\ \rm ms)$ .
  2. The  »AMR codec«  delivers in highest mode  $R_{\rm B}=12.2 \ \rm kbit/s$   ⇒   $244$  bits per speech frame.  However,  the speech codec must also transmit additional information regarding the mode   ⇒   so the data rate before channel coding is  $13 \ \rm kbit/s$ .
  3. Task of the dashed  »Voice Activity Detection«  is to decide whether the current speech frame actually contains a speech signal or just a pause during which the power of the transmit amplifier should be turned down.
  4. By  »channel coding«  redundancy is added again to allow error correction at the receiver   ⇒   the channel encoder  outputs  $456$  bits per frame,  resulting in the data rate  $22.8 \ \rm kbit/s$ . The more important bits are specially protected.
  5. The  »interleaver«  scrambles the resulting bit sequence to reduce burst error influence.  The  $456$  input bits are split into four time frames of  $114$  bits each.  Thus,  two consecutive bits are always transmitted in two different bursts.
  6. A  »data channel«  – marked in red in the figure – differs from a speech channel  $($marked in blue$)$  only by the different input rate  $(9.6 \ \rm kbit/s$  instead of  $104 \ \rm kbit/s)$  and the use of a second,  outer channel encoder instead of the speech encoder.


The components highlighted in green apply equally to speech and data transmission.

⇒   The first common system component for speech and data transmission in the block diagram of the GSM transmitter is the  »encryption«,  which is intended to prevent unauthorized persons from gaining access to the data.  There are two fundamentally different encryption methods:

  • »Symmetric encryption«:  This knows only one secret key,  which is used both for encrypting and enciphering the messages in the transmitter and for decrypting and deciphering them in the receiver.  The key must be generated prior to communication and exchanged between the communication partners via a secure channel.  The advantage of this encryption method used in conventional GSM is that it works very quickly.
  • »Asymmetric encryption«:  This method uses two independent but matching asymmetric keys.  It is not possible to use one key to calculate the other.  The  "public key"  is publicly available and is used for encryption.  The  "private key"  is secret and used for decryption.  In contrast to the symmetric encryption methods,  the asymmetric methods are much slower,  but also offer higher security.


⇒   The second green block is the  »burst composition«,  where there are different burst types.  In a  "normal burst"  the  $114$  encoded,  scrambled and encrypted bits are mapped to  $156.25$  bits by  $($duration  $T_{\rm burst} = 576.9 \ \rm µ s)$  adding the  "guard period",  signaling bits,  etc.

These are transmitted within a time slot of duration  $T_{\rm Z} = T_{\rm burst}$  by means of the  modulation method  "GMSK".  This results in the gross data rate  $270.833 \ \rm kbit/s$.

At the receiver side there are in reverse order the blocks  "demodulation" – "burst de-composition" – "de-cryption" – "de-interleaving" – "channel decoding" – "speech decoding".


In the next sections all blocks of the above transmission scheme are presented in detail.


Encoding for speech signals


Uncoded radio data transmission leads to bit error rates in the percentage range.  However,  with  $\text{channel coding}$  some transmission errors can be detected or even corrected at the receiver.  The bit error rate can thus be reduced to values smaller than  $10^{-5}$.

First,  we consider GSM channel encoding for speech channels,  assuming as speech encoder the  $\text{Full Rate Codec}$.  The channel coding of a speech frame of  $20\ \rm ms$  duration is done in four consecutive steps according to the diagram.

From the description in chapter  "Speech Coding"  it can be seen that not all  $260$  bits have the same influence on the subjectively perceived speech quality.

For coding speech signals in GSM
  • Therefore,  the data are divided into classes according to their importance:   The  $50$  most important bits form the  "Class 1a",  other  $132$  are assigned to  "Class 1b"  and the remaining  $78$  bits result in the less important  "Class 2".
  • In the next step,  a three-bit long  $\text{Cyclic Redundancy Check}$  $\rm (CRC)$  checksum is calculated for the  $50$  class 1a bits using a feedback shift register.  The generator polynomial for this CRC check is:
$$G_{\rm CRC}(D) = D^3 + D +1\hspace{0.05cm}. $$
  • Subsequently,  four (yellow)  "tail bits" $(0000)$  are added to the total of  $185$  bits of class 1a and 1b including the three  $($red$)$  CRC parity bits.  These bits initialize the four memory registers of the following convolutional encoder with  zeros,  so that for each speech frame a defined status can be assumed.
  • The rate  $1/2$ convolutional  encoder doubles these  $189$  most important bits to  $378$  bits and thus protects them significantly against transmission errors.  Then the  $78$  bits of the less important class 2 are appended unprotected.


This way,  there are exactly  $456$  bits per  $20 \ \rm ms$ speech frame after channel coding.

  1. This corresponds to a  $($encoded$)$  data rate of  $22.8\ \rm kbit/s$  compared to  $13\ \rm kbit/s$  after the speech coding.
  2. The effective channel coding rate is thus  $260/456 = 57\%$.


Interleaving for speech signals


The result of convolutional decoding depends not only on the frequency of the transmission errors,  but also on their distribution.

  • To achieve good correction results,  the channel should not have any memory,  but should provide statistically independent bit errors as far as possible.
  • In mobile radio systems, however,  transmission errors usually occur in blocks  $($"error bundles"$)$. 
  • By using the interleaving technique,  such  "bundle errors"  are evenly distributed over several bursts and thus their effects are mitigated.


Interleaving in GSM speech signals

For a speech channel,  the interleaver works in the following way:

  1. The  $456$  input bits per speech frame are divided into four blocks of  $114$  bits each according to a fixed algorithm.  We denote these for the  $n$–th  speech frame by  $A_n$,  $B_n$,  $C_n$  and  $D_n$.  The index  $n-1$  denotes the preceding frame and  $n+1$  the succeeding one.
  2. The block  $A_n$  is further divided into two sub-blocks  $A_{{\rm g},\hspace{0.08cm}n}$  and  $A_{{\rm u},\hspace{0.08cm}n}$  of  $57$  bits each,  where  $A_{{\rm g},\hspace{0.08cm}n}$  denote only the even  $($German:  "gerade"   ⇒   "g"$)$  and  $A_{{\rm u},\hspace{0.08cm}n}$  denote the odd  $($German:  "ungerade"  ⇒   "u"$)$  bit positions.  In the graph,  one recognizes  $A_{{\rm g},\hspace{0.08cm}n}$  and  $A_{{\rm u},\hspace{0.08cm}n}$  by red resp. blue backgrounds.
  3. The subblock  $A_{{\rm g},\hspace{0.08cm}n}$  of the  $n$-th speech frame is identified with the block  $A_{{\rm u},\hspace{0.05cm}n-1}$  of the previous frame and gives the  $114$  payload of a  "normal burst":  $\left (A_{{\rm g},\hspace{0.08cm}n}, A_{{\rm u},\hspace{0.08cm}n-1}\right )$.  The same applies to the next three bursts:  $\left (B_{{\rm g},\hspace{0.08cm}n},\hspace{0.12cm} B_{{\rm u},\hspace{0.08cm}n-1}\right )$,  $\left (C_{{\rm g},\hspace{0.08cm}n}, C_{{\rm u},\hspace{0.08cm}n-1}\right )$,  $\left (D_{{\rm g},\hspace{0.08cm}n}, D_{{\rm u},\hspace{0.08cm}n-1}\right )$.
  4. In the same way,  the odd subblocks of the  $n$-th speech frame are nested with the even sub-blocks of the following frame:  $\left (A_{{\rm g},\hspace{0.08cm}n+1},\hspace{0.12cm} A_{{\rm u},\hspace{0.08cm}n}\right )$, ... ,  $\left (D_{{\rm g},\hspace{0.08cm}n+1},\hspace{0.12cm} D_{{\rm u},\hspace{0.08cm}n}\right )$.


$\text{Conclusions:}$  The scrambling type described here is called  "block-diagonal interleaving"  specifically of degree  $8$:

  1. This reduces the susceptibility to bundle errors.
  2. So two consecutive bits of a data block are never sent directly after each other.
  3. Multi-bit errors occur in isolation after the de-interleaver and can thus be corrected more effectively.


Encoding and interleaving for data signals


For GSM data transmission,  each subscriber only has a net data rate of  $9.6\ \rm kbit/s$  available.  Two methods are used for error protection:

  • »Forward Error Correction«  $\rm (FEC)$  is implemented at the physical layer by applying convolutional codes.
  • »Automatic Repeat Request«  $\rm (ARQ)$  where defective packets that cannot be corrected are re-requested at the link layer.


The graph illustrates channel coding and interleaving for the data channel with  $9.6\ \rm kbit/s$,  which in contrast to the channel coding of the speech channel  $($with bit error rate  $10^{-5}$... $10^{-6})$  allows an almost error-free reconstruction of the data.  Note:

Illustration of coding and interleaving for data signals
Convolutional encoder of rate  $1/2$  used by GSM  $1/2$


  1. The data bit rate of   $9.6\ \rm kbit/s$   is first increased in the  "Terminal Equipment"  of the mobile station through non-GSM specific channel encoding by  $25\%$  to   $12\ \rm kbit/s$   to allow error detection in circuit-switched networks.

  2. In data transmission,  all bits are equivalent,  so unlike speech channel coding, there are no classes.  The  $240$  bits  per  $20 \rm ms$ time frame are combined together with four tailbits  "$0000$"  to form a single data frame.

  3. These  $244$  bits are doubled to  $488$  bits by a convolutional encoder of rate  $1/2$  as in speech channels.  Two encoded symbols are generated per incoming bit,  e.g. according to the generator polynomials  $G_0(D) = 1 + D^3 + D^4$  $($red marks in the second graph$)$  and  $G_1(D) = 1 + D + D^3 + D^4$.

  4. The following interleaver expects as output  – just like a  "speech interleaver" – only  $456$  bits per frame.  Therefore,  from the  $488$  bits at the output of the convolutional encoder still  $32$  bits at the positions  $15 \cdot j \cdot 4 \ \ ( j = 1$, ... ,  $32)$  are removed  $($"puncturing"$)$.

  5. Since data transmission is less time-critical than speech transmission,  a higher interleaving degree is chosen here.  The  $456$  bits are distributed over up to  $24$  interleaver blocks  of  $19$  bits each,  which would not be possible for speech services for reasons of real-time transmission.

  6. Then the  $456$  bits are split into four consecutive  "normal bursts"  and sent.  When packing in the bursts,  groupings of even and odd bits are again formed,  similar to interleaving in the speech channel.


Receiver side of the GSM link - Decoding


The GSM receiver  $($highlighted in yellow$)$  includes GMSK demodulation,  burst decomposition,  decoding,  de-interleaving,  and channel and speech decoding.

Regarding the last two blocks in the graph,  it should be noted:

Receiver-side data processing for GSM




  • The decoding method is not prescribed by the GSM specification, but is left to the individual network operators. The performance depends on the error correction algorithm used.
  • For example,  with the decoding procedure  "Maximum Likelihood Sequence Estimation"  $\rm (MLSE)$,  the most probable bit sequence is determined using the Viterbi algorithm or a MAP receiver  $($"Maximum A-posteriori Probability"$)$.
  • After error correction,  the  "Cyclic Redundancy Check"  $\rm (CRC)$  is performed,  where for the full rate codec the degree of the used CRC generator polynomial is  $G= 3$.  This will detect all error patterns up to weight  $3$  and all bundle errors up to length  $4$.
  • CRC is used to decide the usability of each speech frame.  With positive test result,  the speech signals are synthesized from the  $260$  parameters per frame in the subsequent speech decoder.
  • If frames are failed,  parameters of earlier frames detected as correct are used for interpolation   ⇒   "error concealment". 
  • If several incorrect speech frames occur in succession,  the output is continuously lowered to mute.


Exercise for the chapter


Exercise 3.7: GSM System Components