Decoding of Convolutional Codes

From LNTwww

Block diagram and requirements


A significant advantage of convolutional coding is that there is a very efficient decoding method for this in the form of the  "Viterbi algorithm".  This algorithm,  developed by  Andrew James Viterbi  has already been described in the chapter  "Viterbi receiver"  of the book "Digital Signal Transmission" with regard to its use for equalization.

For its use as a convolutional decoder we assume the block diagram on the right and the following prerequisites:

System model for the decoding of convolutional codes
  • The information sequence  u_=(u1, u2, ...  )  is here in contrast to the description of linear block codes   ⇒   "first main chapter"  or of Reed–Solomon codes   ⇒   "second main chapter"  generally infinitely long  ("semi–infinite").  For the information symbols always applies  ui{0,1}.
  • The encoded sequence  x_=(x1, x2, ... )  with  xi{0,1}  depends not only on   u_   but also on the code rate  R=1/n, the memory  m  and the transfer function matrix  G(D)  . For finite number  L  of information bits,  the convolutional code should be terminated by appending  m  zeros:
u_=(u1,u2,...,uL,0,...,0)x_=(x1,x2,...,x2L,x2L+1,...,x2L+2m).
  • The received sequence  y_=(y1, y2, ...)  results according to the assumed channel model. For a digital model like the  Binary Symmetric Channel  (BSC)  holds   yi{0,1},  so the falsification from  x_  to  y_   can be quantified with the  Hamming distance  dH(x_,y_).
  • The Viterbi algorithm provides an estimate  z_  for the encoded sequence  x_  and another estimate  v_  for the information sequence  u_.  Thereby holds:
Pr(z_x_)!=MinimumPr(υ_u_)!=Minimum.

Conclusion:  Given a digital channel model   (for example,   the BSC model),   the Viterbi algorithm searches from all possible encoded sequences  x_  the sequence  z_  with the minimum Hamming distance   dH(x_,y_)   to the received sequence  y_:

z_=argminx_CdH(x_,y_)=argmaxx_CPr(y_|x_).


Preliminary remarks on the following decoding examples


Trellis for decoding the received sequence  y_

The following  »prerequisites«   apply to all examples in this chapter:

  • Standard convolutional encoder:   Rate R=1/2,  memory  m=2;
  • transfer function matrix:   G(D)=(1+D+D2,1+D2);
  • length of information sequence:   L=5;
  • consideration of termination:   L=7;
  • length of sequences  x_  and  y_ :   14  bits each;
  • allocation according to  y_=(y_1, y_2, ... , y_7)
    ⇒   bit pairs  y_i{00,01,10,11};
  • Viterbi decoding using trellis diagram:
red arrow   ⇒   hypothesis  ui=0,
blue arrow   ⇒   hypothesis  ui=1;
  • hypothetical encoded sequence  x_i{00,01,10,11};
  • all hypothetical quantities with apostrophe.


We always assume that the Viterbi decoding is based at the  Hamming distance  dH(x_i, y_i)  between the received word  y_i  and the four possible code words  xi{00,01,10,11}.  We then proceed as follows:

  • In the still empty circles the error value  Γi(Sμ)  of states  Sμ(0μ3)  at time points  i  are entered.  The initial value is always  Γ0(S0)=0.
  • The error values for  i=1  and  i=2  are given by
Γ1(S0)=dH((00),y_1),Γ1(S1)=dH((11),y_1),
Γ2(S0)=Γ1(S0)+dH((00),y_2),Γ2(S1)=Γ1(S0)+dH((11),y_2),Γ2(S2)=Γ1(S1)+dH((10),y_2),Γ2(S3)=Γ1(S1)+dH((01),y_2).
  • From  i=3  the trellis has reached its basic form, and to compute all  Γi(Sμ)  the minimum between two sums must be determined in each case:
Γi(S0)=Min[Γi1(S0)+dH((00),y_i),Γi1(S2)+dH((11),y_i)],
Γi(S1)=Min[Γi1(S0)+dH((11),y_i),Γi1(S2)+dH((00),y_i)],
Γi(S2)=Min[Γi1(S1)+dH((10),y_i),Γi1(S3)+dH((01),y_i)],
Γi(S3)=Min[Γi1(S1)+dH((01),y_i),Γi1(S3)+dH((10),y_i)].
  • Of the two branches arriving at a node  Γi(Sμ)  the worse one  (which would have led to a larger  Γi(Sμ)  is eliminated.  Only one branch then leads to each node.
  • Once all error values up to and including  i=7  have been determined,  the Viterbi algotithm can be completed by searching the  "connected path"  from the end of the trellis   ⇒   Γ7(S0)  to the beginning   ⇒   Γ0(S0) .
  • Through this path,  the most likely encoded sequence  z_  and the most likely information sequence  v_  are then fixed.
  • Not all received sequences are transmitted error-free  (y_=x_),  however often holds with Viterbis decoding:   z_=x_  and  v_=u_.
  • But if there are too many transmission errors,  the Viterbi algorithm also fails.

Creating the trellis in the error-free case  –  Acumulated error value calculation


First,  we assume the received sequence  y_=(11,01,01,11,11,10,11)  which is here already subdivided into bit pairs: 

y_1,..., y_7.

The numerical values entered in the trellis and the different types of strokes are explained in the following text.

Viterbi scheme for the received vector  y_=(11,01,01,11,11,10,11)
  • Starting from the initial value  Γ0(S0)=0  we get  y_1=(11)  by adding the Hamming distances
dH((00), y_1)=2ordH((11), y_1)=0
to the  "(acumulated) error values"   Γ1(S0)=2, Γ1(S1)=0.
  • In the second decoding step there are error values for all four states:   With  y_2=(01)  one obtains:
Γ2(S0)=Γ1(S0)+dH((00),(01))=2+1=3,
Γ2(S1)=Γ1(S0)+dH((11),(01))=2+1=3,
Γ2(S2)=Γ1(S1)+dH((10),(01))=0+2=2,
Γ2(S3)=Γ1(S1)+dH((01),(01))=0+0=0.
  • In all further decoding steps,  two values must be compared in each case,  whereby the node  Γi(Sμ)  is always assigned the smaller value.
  • For example,  for  i=3  with  y_3=(01):
Γ3(S0)=min[Γ2(S0)+dH((00),(01)),Γ2(S2)+dH((11),(01))]=min[3+1,2+1]=3,
Γ3(S3)=min[Γ2(S1)+dH((01),(01)),Γ2(S3)+dH((10),(01))]=min[3+0,0+2]=2.
  • In the considered example,  from  i=6  the termination of the convolutional code becomes effective.  Here,  only two comparisons are left to determine  Γ6(S0)=3  and  Γ6(S2)=0  and for  i=7  only one comparison with the final error value  Γ7(S0)=0.


The description of the Viterbi decoding process continues in the next section.

Evaluating the trellis in the error-free case  –  Path search


After all error values  Γi(Sμ)  have been determined  ( in the present example for  1i7  and  0μ3),  the Viterbi decoder can start the path search:

  1.   The following graph shows the trellis after the error value calculation.  All circles are assigned numerical values.
  2.   However,  the most probable path already drawn in the graphic is not yet known.
  3.   In the following,  of course,  no use is made of the  "error-free case"  information already contained in the heading.
  4.   Of the two branches arriving at a node,  only the one that led to the minimum error value  Γi(Sμ)  is used for the final path search.
  5.   The  "bad"  branches are discarded.  They are each shown dotted in the above graph.
Viterbi path search for for the received vector  y_=(11,01,01,11,11,10,11)


The path search runs as follows:

  • Starting from the end value  Γ7(S0)  a continuous path is searched in backward direction to the start value  Γ0(S0).  Only the solid branches are allowed.  Dotted lines cannot be part of the selected  (best)  path.
  • The selected path  (grayed out in the graph)  traverses from right to left in the sketch the states is 
S0S2S1S0S2S3S1S0.
There is no second continuous path from  Γ7(S0)  to  Γ0(S0). This means:   The decoding result is unique.
  • The result  v_=(1,1,0,0,1,0,0)  of the Viterbi decoder with respect to the information sequence is obtained if for the continuous path  (but now in forward direction from left to right)  the colors of the individual branches are evaluated  (red   ⇒   "0",   blue   ⇒   1).

From the final value   Γ7(S0)=0   it can be seen that there were no transmission errors in this first example:

  • The decoding result  z_  thus matches the received vector  y_=(11,01,01,11,11,10,11)  and the actual encoded sequence  x_.
  • With error-free transmission,  v_  is not only the most probable info sequence  u_  according to the maximum likelihood criterion,  but both are even identical:   v_u_.


Decoding examples for the erroneous case


Now follow three examples of Viterbi decoding for the erroneous case.

Example 1:  We assume here the received vector  y_=(11,11,10,00,01,01,11)  which does not represent a valid encoded sequence  x_ . The calculation of error values  Γi(Sμ)  and the path search is done as described in section  "Preliminaries"  and demonstrated in the last two sections for the error-free case.

Decoding example with two bit errors at the beginning

As summary of this first example,  it should be noted:

  • Also with this trellis,  a unique path  (with dark gray background)  can be traced,  leading to the following results  (recognizable by the labels or the colors of this path):
z_=(00,11,10,00,01,01,11),
υ_=(0,1,0,1,1,0,0).
  • Comparing the most likely transmitted encoded sequence  z_  with the received vector  y_  shows that there were two bit errors directly at the beginning.  But since the used convolutional code has the free distance  dF=5,  two transmission errors do not yet lead to a wrong decoding result.
  • There are other paths such as the lighter highlighted path
S0S1S3S3S3S2S0S0
that initially appear to be promising.  Only in the last decoding step  (i=7)  can this light gray path finally be discarded.

Further remarks:

  1. The example shows that a too early decision is often not purposeful. 
  2. One can also see the expediency of termination:   With final decision at  i=5  (end of information sequence),  the sequences  (0,1,0,1,1)  and  (1,1,1,1,0)  would still have been considered equally likely.
  3. In the calculation of  Γ5(S0)=3  and  Γ5(S1)=3  here in each case the two comparison branches lead to exactly the same minimum error value. In the graph these two special cases are marked by dash dots.  In this example,  this special case has no effect on the path search.
  4. Nevertheless,  the algorithm always expects a decision between two competing branches.  In practice,  one helps by randomly selecting one of the two paths if they are equal.


Example 2:  In this example,  we assume the following assumptions regarding source and encoder:

Decoding example with three bit errors
u_=(1,1,0,0,1,0,0)
x_=(11,01,01,11,11,10,11).

From the graph you can see here that the decoder decides for the correct path  (dark background)  despite three bit errors.

  • So there is not always a wrong decision,  if more than  dF/2  bit errors occurred.
  • But with statistical distribution of the three bit errors,  wrong decision would be more frequent than right.


Example 3:  Here also applies 

Decoding example with four bit errors
u_=(1,1,0,0,1,0,0)
x_=(11,01,01,11,11,10,11).

Unlike the last example,  a fourth bit error is added:  y_7=(01).

  • Now both branches in step  i=7  lead to the minimum error value  Γ7(S0)=4,  recognizable by the dash-dotted transitions.
  • If one decides in the then required lottery procedure for the path with dark background,  the correct decision is still made even with four bit errors:   v_=(1,1,0,0,1,0,0).
  • Otherwise,  a wrong decision is made. Depending on the outcome of the dice roll in step  i=6  between the two dash-dotted competitors,  you choose either the purple or the light gray path. 
  • Both have little in common with the correct path.


Relationship between Hamming distance and correlation


Especially for the  BSC model  (but also for any other binary channel  ⇒   input  xi{0,1},  output yi{0,1}  such as the  Gilbert–Elliott model)  provides

  • the Hamming distance  dH(x_, y_)  exactly the same information about the similarity of the input sequence  x_  and the output sequence  y_ 
  • as the  inner product.  Assuming that the sequences are in bipolar form  (denoted by tildes)  and that the sequence length is  L  in each case,  the inner product is:
<˜x_,˜y_>=Li=1˜xi˜yiwith˜xi=12xi,˜yi=12yi,˜xi,˜yi{1,+1}.

We sometimes refer to this inner product as the  »correlation value«.  Unlike the   correlation coefficient  the  "correlation value"  may well exceed the range of values  ±1.

Example 4:  We consider here two binary sequences of length  L=10.  Shown on the left are the  »unipolar«  sequences  x_  and  y_  and the product  x_y_.

Relationship between Haming distance and correlation value
  • You can see the Hamming distance  dH(x_, y_)=6   ⇒   six bit errors at the arrow positions.
  • The inner product   <x_y_>=0   has no significance here.  For example,  <0_y_> is always zero regardless of  y_.


The Hamming distance  dH=6  can also be seen from the  »bipolar«  (antipodal) plot in the right graph.

  • The  "correlation value"  has now the correct value:
4(+1)+6(1)=2.
  • For the deterministic relationship between the  "correlation value"  and the  "Hamming distance"  holds with the sequence length  L:
<˜x_˜y_>=L2dH(˜x_,˜y_).


Conclusion:  Let us now interpret this last equation for some special cases:

  • »Identical sequences«:   The Hamming distance is equal to  0  and the  correlation value is equal to  L.
  • »Inverted sequences«:   The Hamming distance is equal to  L  and the  correlation value  is equal to  L.
  • »Uncorrelated sequences«:   The Hamming distance is equal to  L/2  and the  correlation value  is equal to  0.

Viterbi algorithm based on correlation and metrics


Using the insights of the last section,  the Viterbi algorithm can also be characterized as follows.

Alternative description: 

  • The Viterbi algorithm searches from all possible encoded sequences  x_C  the sequence  z_  with the  »maximum correlation value«  to the received sequence  y_:
z_=argmaxx_C˜x_,˜y_with˜x_=12x_,˜y_=12y_.
  • Here,    ...   denotes a  "correlation value"  according to the statements in the last section.  The tildes again indicate the bipolar representation.


The graphic shows the corresponding trellis evaluation. 

  • the input sequence and the encoded sequence are
u_=(0,1,0,1,1,0,0)x_=(00,11,10,00,01,01,11).
Viterbi decoding based on correlation and metrics

Further are assumed:

  • Standard convolutional encoder:   rate  R=1/2,  memory  m=2;
  • the transfer function matrix:   G(D)=(1+D+D2,1+D2);
  • length of the information sequence:   L=5;
  • consideration of termination:  L=7;
  • received vector   y_=(11,11,10,00,01,01,11)   ⇒   two bit errors;
  • Viterbi decoding using trellis diagram:
  • red arrow ⇒   hypothesis ui=0,
  • blue arrow ⇒   hypothesis ui=1.


Adjacent trellis and the  Example 1 trellis   are very similar.  Just like the search for the sequence with the  "minimum Hamming distance",  the  "search for the maximum correlation value"  is also done step by step:

  1.   The nodes here are called the  "cumulative metrics"  Λi(Sμ).
  2.   The  "branch metrics"  specify the  "metric increments".
  3.   The final value  Λ7(S0)=10  indicates the  "end correlation value"  between the selected sequence  z_  and the received vector  y_.
  4.   In the error-free case,  the result would be  Λ7(S0)=14.

Example 5:  The following detailed description of the trellis evaluation refer to the above trellis:

  • The acumulated metrics at time  i=1  result with  y_1=(11)  to
Λ1(S0)=<(00),(11)>=<(+1,+1),(1,1)>=2,
Λ1(S1)=<(11),(11)>=<(1,1),(1,1)>=+2.
  • Accordingly,  at time  i=2  with  y_2=(11):
Λ2(S0)=Λ1(S0)+<(00),(11)>=22=4,
Λ2(S1)=Λ1(S0)+<(11),(11)>=2+2=0,
Λ2(S2)=Λ1(S1)+<(10),(11)>=+2+0=+2,
Λ2(S3)=Λ1(S1)+<(01),(11)>=+2+0=+2.
  • From time  i=3  a decision must be made between two acumulated metrics.  For example,  y_3=(10)  is obtained for the top and bottom metrics in the trellis:
Λ3(S0)=max[Λ2(S0)+<(00),(11)>,Λ2(S1)+<(00),(11)>]=max[4+0,+2+0]=+2,
Λ3(S3)=max[Λ2(S1)+<(01),(10)>,Λ2(S3)+<(10),(10)>]=max[0+0,+2+2]=+4.
  • Comparing the  »accumulated correlation values«  Λi(Sμ)  to be maximized with the   »accumulated error values«  Γi(Sμ)  to be minimized  according to the  Example 1,  one sees the following deterministic relationship:
Λi(Sμ)=2[iΓi(Sμ)].
  • The selection of surviving branches to each decoding step is identical for both methods,  and the path search also gives the same result.


Conclusions: 

  1.     In the binary channel – for example according to the BSC model – the two described Viterbi variants  "error value minimization"  and   "correlation value maximization"  lead to the same result.
  2.   In the AWGN channel,  on the other hand,  "error value minimization"  is not applicable because no Hamming distance can be specified between the binary input  x_  and the analog output  y_.
  3.   For the AWGN channel,  the  "correlation value maximization"  is rather identical to the minimization of the  Euclidean distance  – see  "Exercise 3.10Z".
  4.   Another advantage of the  "correlation value maximization"  is that a reliability information about the received values  y_  can be considered in a simple way.


Viterbi decision for non-terminated convolutional codes


So far,  a terminated convolutional code of length  L=L+m  has always been considered,  and the result of the Viterbi decoder was the continuous trellis path from the start time  (i=0)  to the end  (i=L).

  • For non–terminated convolutional codes  (L)  this decision strategy is not applicable.
  • Here,  the algorithm must be modified to provide a best estimate  (according to maximum likelihood)  of the incoming bits of the encoded sequence in finite time.
Exemplary trellis and surviving paths


The graphic shows in the upper part an exemplary trellis for

  • "our standard encoder"  (R=1/2, m=2)
G(D)=(1+D+D2, 1+D2),
  • the zero input sequence   ⇒   u_=0_=(0,0,0, ...);  output:
x_=0_=(00,00,00, ...),
  • in each case,  transmission errors at  i=4  and  i=5.


⇒   Based on the stroke types,  one can recognize allowed  (solid)  and forbidden  (dotted)  arrows in red  (ui=0)  and blue  (ui=1).

Dotted lines have lost a comparison against a competitor and cannot be part of the selected path.

⇒   The lower part of the graph shows the  2m=4  surviving paths  Φ9(Sμ)  at time  i=9.

  • It is easiest to find these paths from right to left  ("backward").
  • The following specification shows the traversed states  Sμ  but in the forward direction:
Φ9(S0):S0S0S0S0S0S0S0S0S0S0,
Φ9(S1):S0S0S0S0S1S2S1S3S2S1,
Φ9(S2):S0S0S0S0S1S2S1S2S1S2,
Φ9(S3):S0S0S0S0S1S2S1S2S1S3.

At earlier times  (i<9)  other survivors  Φi(Sμ)  would result. Therefore, we define:

Definition:  The   »survivor«   Φi(Sμ)  is the continuous path from the start  S0  (at  i=0)  to the node  Sμ at time  i.

  • It is recommended to search for paths in the backward direction.


The following graph shows the surviving paths for time points  i=6  to  i=9.  In addition,  the respective metrics  (accumulated correlaltion values)  Λi(Sμ)  for all four states are given.

The survivors  Φ6, ... , Φ9

This graph is to be interpreted as follows:

  1. At time  i=9  no final maximum likelihood decision can yet be made about the first nine bits of the information sequence.
  2. But it is already certain that the most probable bit sequence is represented by one of the paths  Φ9(S0), ... , Φ9(S3).
  3. Since all four paths up to  i=3  are identical,  the decision  "v1=0, v2=0, v3=0"  is the the most probable.  Also at a later time no other decision would be made. 
  4. Regarding the bits  v4, v5, ...  one should not decide at this early stage.  Only the first two zeros are safe,  not  v3=0.
  5. If one had to make a constraint decision at time  i=9  one would choose  Φ9(S0)   ⇒   v_=(0,0,. .. ,0) since the metric  Λ9(S0)=14  is larger than the comparison metrics.
  6. The forced decision at time  i=9  leads to the correct result in this example,  see lower sketch.
  7. A decision at time  i=6  would have would have led to the wrong result  ⇒   v_=(0,0,0,1,0,1),  see upper sketch.
  8. At time  i=7  resp.  i=8,  a forced decision would not have been clear,  as shown in the two middle diagrams


Other decoding methods for convolutional codes


We have so far dealt only with the Viterbi algorithm in the form presented in 1967 by Andrew J. Viterbi in  [Vit67][1].  It was not until 1974 that  George David Forney  proved that this algorithm performs maximum likelihood decoding of convolutional codes.

But even in the years before,  many scientists were very eager to provide efficient decoding methods for the convolutional codes first described by  Peter Elias  in 1955.  Among others,  more detailed descriptions can be found for example in  [Bos99][2].

All of these decoding schemes,  as well as the Viterbi algorithm as described so far,  provide  "hard decided"  output values   ⇒   vi{0,1}.  Often,  however,  information about the reliability of the decisions made would be desirable,  especially when there is a concatenated coding scheme with an outer and an inner code.

If the reliability of the bits decided by the inner decoder is known at least roughly,  this information can be used to  (significantly)  reduce the bit error probability of the outer decoder.  The  "soft output Viterbi algorithm"  (SOVA)  proposed by  Joachim Hagenauer  in  [Hag90][3]  allows to specify a reliability measure in each case in addition to the decided symbols.

Finally, we go into some detail about the  »BCJR algorithm«  named after its inventors L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv  [BCJR74][4].

While the Viterbi algorithm only estimates the total sequence   ⇒   block-wise ML,  the BCJR algorithm estimates a single bit considering the entire received sequence.  So this is a  "bit-wise maximum a-posteriori decoding"   ⇒   bit-wise MAP.


Conclusion: 

The difference between Viterbi–algorithm and BCJR algorithm shall be  – greatly simplified –  illustrated by the example of a terminated convolutional code:

  • The   »Viterbi algorithm «  processes the trellis in only one direction  – the forward direction  – and computes the metrics  Λi(Sμ)  for each node.  After reaching the terminal node,  the surviving path is searched,  which identifies the most probable encoded sequence.
  • In the   »BCJR algorithm «  the trellis is processed twice,  once in forward direction and then in backward direction.  Two metrics can then be specified for each node,  from which the a-posterori probability can be determined for each bit

.

Notes:

  1.   This short summary is based on the textbook  [Bos99][2].
  2.   A description of the BCJR–algorithm follows also in section  "Hard Decision vs. Soft Decision"  "in the fourth main chapter"  "Iterative Decoding Methods".


Exercises for the chapter


Exercise 3.09: Basics of the Viterbi Algorithm

Exercise 3.09Z: Viterbi Algorithm again

Exercise 3.10: Metric Calculation

Exercise 3.10Z: Maximum Likelihood Decoding of Convolutional Codes

Exercise 3.11: Viterbi Path Finding

References

  1. Viterbi, A.J.:  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm.  In: IEEE Transactions on Information Theory, vol. IT-13, pp. 260-269, April 1967.
  2. Jump up to: 2.0 2.1 Bossert, M.:  Channel Coding for Telecommunications.  Wiley & Sons, 1999.
  3. Hagenauer, J.:  Soft Output Viterbi Decoder.  In: Technischer Report, Deutsche Forschungsanstalt für Luft- und Raumfahrt (DLR), 1990.
  4. Bahl, L.R.; Cocke, J.; Jelinek, F.; Raviv, J.:  Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate.  In: IEEE Transactions on Information Theory, Vol. IT-20, pp. 284-287, 1974.