Information Theoretical Limits of Channel Coding

From LNTwww
< Channel Coding
Revision as of 17:16, 18 November 2022 by Hwang (talk | contribs)

Channel coding theorem and channel capacity


We further consider a binary block code with 

  • k  information bits per block,
  • code words of length  n,
  • resulting in the code rate  R=k/n  with the unit  "information bit/code symbol".


The ingenious information theorist  Claude E. Shannon  has dealt very intensively with the correctability of such codes already in 1948 and has given for this a limit for each channel which results from information-theoretical considerations alone.  Up to this day,  no code has been found which exceeds this limit,  and this will remain so.

Shannon's channel coding theorem:  For each channel with channel capacity  C>0  there always exists  (at least)  one code whose error probability approaches zero as long as the code rate  R  is smaller than the channel capacity  C.  The prerequisite for this is that the following holds for the block length of this code:  

n.


Notes: 

  • The statement  "The error probability approaches zero"  is not identical with the statement  "The transmission is error-free".  Example:   For an infinitely long sequence,  finitely many symbols are corrupted KORREKTUR: falsified.
  • For some channels,  even with  R=C  the error probability still goes towards zero  (but not for all).


The inverse of the channel coding theorem is also true and states:

Inverse:  If the code rate  R  is larger than the channel capacity  C,  then an arbitrarily small error probability cannot be achieved in any case.


To derive and calculate the channel capacity,  we first assume a digital channel with  Mx  possible input values  x  and  My  possible output values  y.  Then,  for the mean mutual information  – in short,  the  mutual information   –  between the random variable  x  at the channel input and the random variable  y  at its output:

I(x;y)=MXi=1MYj=1Pr(xi,yj)log2Pr(yj|xi)Pr(yj)=MXi=1MYj=1Pr(yj|xi)Pr(xi)log2Pr(yj|xi)MXk=1Pr(yj|xk)Pr(xk).

In the transition from the first to the second equation, the   Theorem of Bayes   and the   Theorem of Total Probability   were considered.

Further,  it should be noted:

  • The  "binary logarithm"  is denoted here with  "log2".  Partially,  we also use  "ld"  ("logarithm dualis")  for this in our learning tutorial.
  • In contrast to the book  Information Theory  we do not distinguish in the following between the random variable  (upper case letters  X  resp.  Y)  and the realizations (lower case letters  x  resp.  y).


Definition:  The  »channel capacity«  introduced by Shannon gives the maximum mutual information  I(x;y)  between the input variable  x  and the output variable  y:

C=maxPr(xi)I(X;Y).

The pseudo–unit  "bit/channel use"  must be added.


Since the maximization of the mutual information  I(x;y)  must be done over all possible  (discrete)  input distributions  Pr(xi),
»the channel capacity is independent of the input and thus a pure channel parameter«.

Channel capacity of the BSC model


We now apply these definitions to the  BSC model  ("Binary Symmetric Channel"):

I(x;y)=Pr(y=0|x=0)Pr(x=0)log2Pr(y=0|x=0)Pr(y=0)+Pr(y=1|x=0)Pr(x=0)log2Pr(Y=1|x=0)Pr(y=1)+
+Pr(y=0|x=1)Pr(x=1)log2Pr(Y=0|x=1)Pr(y=0)+Pr(y=1|x=1)Pr(x=1)log2Pr(y=1|x=1)Pr(y=1).

The channel capacity is arrived at by the following considerations:

Binary symmetric channel
  • Maximization with respect to the input distribution leads to equally probable symbols:
Pr(x=0)=Pr(x=1)=1/2.
  • Due to the symmetry evident from the model,  the following then holds simultaneously:
Pr(y=0)=Pr(y=1)=1/2.
  • We also consider the BSC transmission probabilities:
Pr(y=1|x=0)=Pr(y=0|x=1)=ε,
Pr(y=0|x=0)=Pr(y=1|x=1)=1ε.
  • After combining two terms each,  we thus obtain:
C=21/2εlog2ε1/2+21/2(1ε)log21ε1/2εld2εlog21ε+(1ε)log22(1ε)log211ε
C=1Hbin(ε).
  • Here,  the  "binary entropy function"  is used:
Hbin(ε)=εlog21ε+(1ε)log211ε.

The right graph shows the BSC channel capacity depending on the falsificaion probability  ε.  On the left,  for comparison,  the binary entropy function is shown,  which has already been defined in the chapter  Discrete Memoryless Sources  of the book  "Information Theory". 

Channel capacity of the BSC model

One can see from this representation:

  • The falsificaion probability  ε  leads to the channel capacity  C(ε).  According to the  channel coding theorem  error-free decoding is only possible if the code rate  RC(ε).
  • With  ε=10%,  error-free decoding is impossible  if the code rate  R>0.531   (because:  C(0.1)=0.531).
  • With  ε=50%,  error-free decoding is impossible even if the code rate is arbitrarily small   (because:  C(0.5)=0).
  • From an information theory point of view  ε=1  (inversion of all bits)  is equivalent to  ε=0  ("error-free transmission").
  • Error-free decoding is achieved here by swapping the zeros and ones, i.e. by a so-called  "mapping".
  • Similarly  ε=0.9  is equivalent to  ε=0.1


Channel capacity of the AWGN model


We now consider the  AWGN channel  ("Additive White Gaussian Noise").

AWGN channel model

Here,  for the output signal  y=x+n,  where  n  describes a  Gaussian distributed random variable  and it applies to their expected values  ("moments"):

  1.  E[n]=0,
  2.  E[n2]=Pn.


Thus,  regardless of the input signal  x  (analog or digital),  there is always a continuous-valued output signal  y,  and in the  equation for mutual information  is to be used: 

My.

The calculation of the AWGN channel capacity is given here only in keywords.  The exact derivation can be found in the fourth main chapter  "Discrete Value Information Theory"  of the textbook  Information Theory.

  • The input quantity  x  optimized with respect to maximum mutual information will certainly be continuous-valued,  that is,  for the AWGN channel in addition to   My   also holds   Mx.
C=maxfx(x)I(x;y),wheremustapply:E[x2]Px.
  • The optimization also yields a Gaussian distribution for the input PDF   ⇒   xn  and  y  are Gaussian distributed according to the probability density functions  fx(x),   fn(n)   and   fy(y).  We designate the corresponding powers  PxPn  and  Py.
  • After longer calculation one gets for the channel capacity using the  "binary logarithm"  log2()  –  again with the pseudo-unit  "bit/channel use":
C=log2PyPn=log2Px+PnPn=1/2log2(1+PxPn).
  • If  x  describes a  discrete-time signal with symbol rate  1/TS,  it must be bandlimited to   B=1/(2TS),   and the same bandwidth  B  must be applied to the noise signal  n  ⇒   "noise bandwidth":
PX=ESTS,PN=N02TS.
  • Thus, the AWGN channel capacity can also be expressed by the  »transmit KORREKTUR: transmitted? energy per symbol«  (ES)  and the  »noise power density«  (N0):
C=1/2log2(1+2ES/N0),unit:bitchanneluse.
  • The following equation gives the channel capacity per unit time  (denoted by ):
C=CTS=Blog2(1+2ES/N0),unit:bittimeunit.

Example 1: 

  • For  ES/N0=7.5   ⇒   10lgES/N0=8.75dB  the channel capacity is 
C=1/2log2(16)=2bit/channeluse.
  • For a channel with the  (physical)  bandwidth  B=4kHz,  which corresponds to the sampling rate  fA=8kHz
C=16kbit/s.


  • However,  a comparison of different coding KORREKTUR: encoding methods at constant  "energy per transmitted symbol"   ⇒   ES  is not fair. 
  • Rather,  for this comparison,  the  "energy per source bit"   ⇒   EB  should be fixed.  The following relationships apply:
ES=REBEB=ES/R.

Channel Coding Theorem for the AWGN channel: 

  • Error-free decoding (at infinitely long blocks   ⇒   n)  is always possible if the code rate  R  is smaller than the channel capacity  C:
R<C=1/2log2(1+2REB/N0).
  • For each code rate  R  the required  EB/N0  of the AWGN channel can thus be determined so that error-free decoding is just possible.
  • One obtains for the limiting case  R=C:
EB/N0>22R12R.


The graph summarizes the result,  with the ordinate  R  plotted on a linear scale and the abscissa  EB/N0  plotted logarithmically.

AWGN channel capacity
  • Error-free coding is not possible outside the blue area.
  • The blue limit curve indicates the AWGN channel capacity  C.


From this graph and the above equation,  the following can be deduced:

  1. Channel capacity  C  increases with  10lgEB/N0  somewhat less than linearly.  In the graph,  some selected function values are indicated as blue crosses.
  2. If  10lgEB/N0<1.59dB,  error-free decoding is impossible in principle.  If the code rate  R=0.5,  then  10lgEB/N0>0dB  must be   ⇒  EB>N0.
  3. For all binary codes holds per se  0<R1. Only with non-binary codes  ⇒   rates  R>1  are possible.  For example,  the maximum possible code rate of a quaternary code:  R=log2My=log24=2.
  4. All one-dimensional modulation types – i.e., those methods that use only the in-phase– or only the quadrature component such as  2–ASKBPSK  and  2–FSK  must be in the blue area of the present graphic.
  5. As shown in the chapter  Maximum code rate for QAM structures,  there is a  "friendlier"  limit curve for two-dimensional modulation types such as the  Quadrature Amplitude Modulation.

AWGN channel capacity for binary input signals


In this book we restrict ourselves mainly to binary codes,  that is,  to the Galois field  GF(2n).  With this

Conditional PDFs for AWGN channel and binary input
  • firstly,  the code rate is limited to the range  R1,
  • secondly,  also for  R1  not the whole blue region is available  (see previous section).


  1. the parameters  Mx=2  and  My,
  2. bipolar signaling   ⇒   x=0   →   ˜x=+1  and  x=1   →   ˜x=1,
  3. the transition from conditional probabilities  Pr(˜xi)  to conditional probability density functions,
  4. replace the sum with an integration.

  • The optimization of the source leads to equally probable symbols:
Pr(˜x=+1)=Pr(˜x=1)=1/2.
  • This gives for the maximum of the mutual information,  i.e. for the channel capacity:
C=1/2+[fy|˜x=+1(y)log2fy|˜x=+1(y)fy(y)+fy|˜x=1(y)log2fy|˜x=1(y)fy(y)]dy.

The integral cannot be solved in mathematical closed form, but can only be evaluated numerically.

AWGN channel capacity for binary input signals
  1. The green curve shows the result.
  2. The blue curve gives for comparison the channel capacity for Gaussian distributed input signals derived in the last section.


It can be seen:

  • For  10lgEB/N0<0dB  the two capacitance curves differ only slightly.
  • So,  for binary bipolar input,  compared to the optimum (Gaussian) input,  the characteristic  10lgEB/N0  only needs to be increased by about  0.1dB  to also allow the code rate  R=0.5.
  • From  10lgEB/N06dB  the capacity  C=1bit/channeluse  of the AWGN channel for binary input is almost reached.
  • In between, the limit curve is approximately exponentially increasing.


Common channel codes versus channel capacity


Now it shall be shown to what extent established channel codes approximate the BPSK channel capacity  (green curve).   In the following graph the rate   R=k/n   of these codes or the capacity   C   (with the additional pseudo–unit "bit/channel use")  is plotted as ordinate.

Rates and required  EB/N0  of different channel codes

Further,  it is assumed:

  • the AWGN–channel,  denoted by  10lgEB/N0  in dB,  and
  • bit error rate  BER=105 for all codes marked by crosses.


Please note: 

  1. The channel capacity curves always apply to  n  and  BER0 .
  2. If one would apply this strict requirement  "error-free"  also to the considered channel codes of finite code length  n,  this would always require  10lgEB/N0 .
  3. But this is an academic problem of little practical significance.  For  BER=1010  a qualitatively and also quantitatively similar graph would result.
  4. For convolutional codes,  the third identifier parameter has a different meaning than for block codes.  For example,  CC (2, 1, 32)  indicates the memory  m=32


The following are some  »explanations of the data taken from the lecture [Liv10][1]«:

  1. The points  AB  and  C  mark  Hamming codes  of different rate.  They all require for  BER=105  more than  10lgEB/N0=8dB.
  2. D  denotes the binary  Golay code  with rate  1/2  and  E  denotes a  Reed–Muller code.  This very low rate code was used 1971 on the Mariner 9 spacecraft.
  3. Marked by  F  is a high rate  Reed–Solomon codes   (R=223/255>0.9)  and a required  10lgEB/N0<6dB.
  4. The markers  G  and  H  denote exemplary  convolutional codes  medium rate. The code  G  was used as early as 1972 on the Pioneer10 mission.
  5. The channel coding of the Voyager–mission in the late 1970s is marked by  I.  It is the concatenation of a  CC (2, 1, 7)  with a  Reed–Solomon code.


Much better results can be achieved with  »iterative decoding methods«  (see fourth main chapter),  as the second graph shows.

Rates and required  EB/N0  for iterative coding methods
  • This means:  The individual marker points are much closer to the capacity curve  CBPSK  for digital input.
  • The solid blue Gaussian capacity curve is labeled  CGaussian.

Here are some more explanations about this graph:

  1. Red crosses mark so-called  turbo codes  accordingt to  CCSDS  ("Consultative Committee for Space Data Systems")  with each  k=6920  information bits and different code lengths  n.
  2. These codes,  invented by  Claude Berrou  around 1990,  can be decoded iteratively.  The  (red)  marks are each less than  1dB  from the Shannon bound.
  3. Similar behavior is shown by  LDPC codes  ("Low Density Parity–check Codes")  marked by white rectangles,  which have been used since in 2006 on  DVB–S(2)  ("Digital Video Broadcast over Satellite").
  4. These are well suited for iterative decoding using  "factor–graphs"    and   "exit charts"  due to the sparse occupancy of the parity-check matrix  H  (with "ones").  See [Hag02][2]
  5. The black dots mark the CCSDS specified  LDPC codes,  all of which assume a constant number of information bits  (k=16384)
  6. In contrast,  for all white rectangles,  the code length  n=64800  is constant,  while the number  k  of information bits changes according to the rate  R=k/n.
  7. Around the year 2000,  many researchers had the ambition to approach the Shannon bound to within fractions of a  dB.  The yellow cross marks such a result from [CFRU01][3]. Used an irregular rate–1/2–LDPC of code length  n=2106.

Conclusion:  It should be noted that Shannon recognized and proved as early as 1948 that no one-dimensional modulation method can lie to the left of the AWGN limit curve "Gaussian (real)" drawn throughout.

  • For two-dimensional methods such as QAM and multilevel PSK, on the other hand, the "Gaussian (complex)" limit curve applies, which is drawn here as a dashed line and always lies to the left of the solid curve.
  • For more details, see the  Maximum code rate for QAM structures  section of the "Information Theory" book.
  • Also, this limit curve has now been nearly reached with QAM procedures and very long channel codes, without ever being exceeded.


Exercises for the chapter


Exercise 1.17: About the Channel Coding Theorem

Exercise 1.17Z: BPSK Channel Capacity

References

  1. Liva, G.: Channel Coding. Lecture manuscript, Chair of Communications Engineering, TU Munich and DLR Oberpfaffenhofen, 2010.
  2. Hagenauer, J.: The Turbo Principle in Mobile Communications.  In: Int. Symp. on Information Theory and Its Applications, Oct.2010, PDF–Dokument.
  3. Chung S.Y; Forney Jr., G.D.; Richardson, T.J.; Urbanke, R.:  On the Design of Low-Density Parity- Check Codes within 0.0045 dB of the Shannon Limit. – In: IEEE Communications Letters, vol. 5, no. 2 (2001), pp. 58–60.