Processing math: 100%

Two-Dimensional Random Variables

From LNTwww

# OVERVIEW OF THE FOURTH MAIN CHAPTER #


Now random variables with statistical bindings are treated and illustrated by typical examples. 

After the general description of two-dimensional random variables,  we turn to

  1. the  "auto-correlation function", 
  2. the  "cross-correlation function"
  3. and the associated spectral functions  ("power-spectral density",  "cross power-spectral density").


Specifically,  this chapter covers:

  • the statistical description of  »two-dimensional random variables«  using the  »joint PDF«,
  • the difference between  »statistical dependence«  and  »correlation«,
  • the classification features  »stationarity«  and  »ergodicity«  of stochastic processes,
  • the definitions of  »auto-correlation function«  (ACF)  and  »power-spectral density«  (PSD),
  • the definitions of  »cross-correlation function«  (CCF)   and  »cross power-spectral density«  (CPSD)
  • the numerical determination of all these variables in the two- and multi-dimensional case.



Properties and examples


As a transition to the  "correlation functions"  we now consider two random variables  x  and  y,  between which statistical dependences exist. 

Each of these two random variables can be described on its own with the introduced characteristic variables corresponding


Definition:  To describe the statistical dependences between two variables  x  and  y,  it is convenient to combine the two components
      into one   "two-dimensional random variable"   or   "2D random variable(x,y).

  • The individual components can be signals such as the real and imaginary parts of a phase modulated signal.
  • But there are a variety of two-dimensional random variables in other domains as well,  as the following example will show.


Example 1:  The left diagram is from the random experiment  "Throwing two dice". 

Two examples of statistically dependent random variables
  • Plotted to the right is the number of the first die  (W1)
  • plotted to the top is the sum  S  of both dice. 


The two components here are each discrete random variables between which there are statistical dependencies:

  • If  W1=1,  then the sum  S  can only take values between  2  and  7,  each with equal probability.
  • In contrast,  for  W1=6  all values between  7  and  12  are possible,  also with equal probability.


In the right diagram,  the maximum temperatures of the  31 days in May 2002 of Munich  (to the top)  and the mountain  "Zugspitze"  (to the right)  are contrasted.  Both random variables are continuous in value:

  • Although the measurement points are about  100 km  apart,  and on the Zugspitze,  due to the different altitudes  (nearly  3000  versus  520  meters)  is on average about  20  degrees colder than in Munich,  one recognizes nevertheless a certain statistical dependence between the two random variables  ΘM  and  ΘZ.
  • If it is warm in Munich,  then pleasant temperatures are also more likely to be expected on the Zugspitze.  However,  the relationship is not deterministic:  The coldest day in May 2002 was a different day in Munich than the coldest day on the Zugspitze.

Joint probability density function


We restrict ourselves here mostly to continuous valued random variables.

  • However,  sometimes the peculiarities of two-dimensional discrete random variables are discussed in more detail. 
  • Most of the characteristics previously defined for one-dimensional random variables can be easily extended to two-dimensional variables.


Definition:  The  probability density function  (PDF)  of the two-dimensional random variable at the location  (xμ,yμ)   ⇒   "joint PDF"   or   "2D–PDF"
is an extension of the one-dimensional PDF  (  denotes logical  "and"  operation):

fxy(xμ,yμ)=limΔx0Δy0Pr[(xμΔx/2xxμ+Δx/2)(yμΔy/2yyμ+Δy/2)]Δ xΔy.

Note:

  • If the two-dimensional random variable is discrete,  the definition must be slightly modified:
  • For the lower range limits,  the  "less-than-equal"  sign must then be replaced by  "less-than"  according to the section  CDF for discrete-valued random variables


Using this joint PDF fxy(x,y),  statistical dependencies within the two-dimensional random variable  (x, y)  are also fully captured in contrast to the two one-dimensional density functions   ⇒   marginal probability density functions   (or   "edge probability density functions"):

fx(x)=+fxy(x,y)dy,
fy(y)=+fxy(x,y)dx.

These two marginal probability density functions  fx(x)  and  fy(y)

  • provide only statistical information about the individual components  x  and  y, resp.
  • but not about the statistical bindings between them.


Two-dimensional cumulative distribution function


Definition:  Like the  "2D–PDF",  the  2D cumulative distribution function  is merely a useful extension of the  one-dimensional distribution function  (CDF):

Fxy(rx,ry)=Pr[(xrx)(yry)].


The following similarities and differences between the  "1D–CDF"  and the  2D–CDF"  emerge:

  • The functional relationship between two-dimensional PDF and two-dimensional CDF is given by integration as in the one-dimensional case,  but now in two dimensions.  For continuous valued random variables:
Fxy(rx,ry)=ryrxfxy(x,y)dxdy.
  • Inversely,  the probability density function can be given from the cumulative distribution function by partial differentiation to  rx  and  ry:
fxy(x,y)=d2Fxy(rx,ry)drxdry|rx=xry=y.
  • Relative to the two-dimensional cumulative distribution function  Fxy(rx,ry)  the following limits apply:
Fxy(,)=0,
Fxy(rx,+)=Fx(rx),
Fxy(+,ry)=Fy(ry),
Fxy(+,+)=1.
  • From the last equation  (infinitely large  rx  and  ry)  we obtain the  normalization condition  for the  "2D– PDF":
++fxy(x,y)dxdy=1.

Conclusion:  Note the significant difference between one-dimensional and two-dimensional random variables:

  • For one-dimensional random variables,  the area under the PDF always yields the value  1.
  • For two-dimensional random variables,  the PDF volume is always equal  1.

PDF for statistically independent components


For statistically independent components  xy  the following holds for the joint probability according to the elementary laws of statistics if  x  and  y  are continuous in value:

Pr[(x1xx2)(y1yy2)]=Pr(x1xx2)Pr(y1yy2).

For this,  in the case of independent components can also be written:

Pr[(x1xx2)(y1yy2)]=x2x1fx(x)dxy2y1fy(y)dy.

Definition:  It follows that for  statistical independence  the following condition must be satisfied with respect to the  two-dimensional probability density function:

fxy(x,y)=fx(x)fy(y).


Example 2:  In the graph,  the instantaneous values of a two-dimensional random variable are plotted as points in the  (x,y)–plane.

  • Ranges with many points,  which accordingly appear dark,  indicate large values of the two-dimensional PDF  fxy(x,y).
  • In contrast,  the random variable  (x,y)  has relatively few components in rather bright areas.
Statistically independent components:  fxy(x,y), fx(x)  and fy(y)


The graph can be interpreted as follows:

  • The marginal probability densities  fx(x)  and  fy(y)  already indicate that both  x  and  y  are Gaussian and zero mean,  and that the random variable  x  has a larger standard deviation than  y.
  • fx(x)  and  fy(y)  do not provide information on whether or not statistical bindings exist for the random variable  (x,y).
  • However,  using the  "2D-PDF"  fxy(x,y)  one can see that here there are no statistical bindings between the two components  x  and  y.
  • With statistical independence,  any cut through  fxy(x,y)  parallel to  y–axis yields a function that is equal in shape to the marginal PDF  fy(y).  Similarly,  all cuts parallel to  x–axis are equal in shape to  fx(x).
  • This fact is equivalent to saying that in this example  fxy(x,y)  can be represented as the product of the two marginal probability densities:  
fxy(x,y)=fx(x)fy(y).

PDF for statistically dependent components


If there are statistical bindings between  x  and  y,  then different cuts parallel to  x– and  y–axis,  resp.,  yield different  (non-shape equivalent)  functions.  In this case,  of course,  the joint PDF cannot be described as a product of the two  (one-dimensional)  marginal probability densities functionseither.

Example 3:  The graph shows the instantaneous values of a two-dimensional random variable in the  (x,y)–plane.

Statistically dependent components:  fxy(x,y), fx(x)fy(y)


Now,  unlike  Example 2  there are statistical bindings between  x  and  y.

  • The two-dimensional random variable takes all  "2D" values with equal probability in the parallelogram drawn in blue.
  • No values are possible outside the parallelogram.



One recognizes from this representation:

  1. Integration over fxy(x,y)  parallel to the  x–axis leads to the triangular marginal PDF  fy(y),  integration parallel to  y–axis to the trapezoidal PDF fx(x).
  2. From the joint PDF fxy(x,y)  it can already be guessed that for each  x–value on statistical average a different  y–value is to be expected.
  3. This means that the components  x  and  y  are statistically dependent on each other.

Expected values of two-dimensional random variables


A special case of statistical dependence is  "correlation".

Definition:  Under  correlation  one understands a  "linear dependence"  between the individual components  x  and  y.

  • Correlated random variables are thus always also statistically dependent.
  • But not every statistical dependence implies correlation at the same time.


To quantitatively capture correlation,  one uses various expected values of the two-dimensional random variable  (x,y).

These are defined analogously to the one-dimensional case,

  • according to  Chapter 2  (for discrete valued random variables).
  • and  Chapter 3  (for continuous valued random variables):


Definition:  For the  (non-centered)  moments  the following relation holds:

mkl=E[xkyl]=++xkylfxy(x,y)dxdy.

Thus,  the two linear means are  mx=m10  and  my=m01.


Definition:  The  central moments  (related to  mx  and  my)  are:

μkl=E[(xmx)k(ymy)l].

In this general definition equation,  the variances  σ2x  and  σ2y  of the two individual components are included by  μ20  and  μ02,  resp.


Definition:  Of particular importance is the  covariance  (k=l=1),  which is a measure of the  "linear statistical dependence"  between the variables  x  and  y:

μ11=E[(xmx)(ymy)]=++(xmx)(ymy)fxy(x,y)dxdy.

In the following,  we also denote the covariance  μ11  in part by  "μxy",  if the covariance refers to the random variables  x  and  y.


Notes:

  • The covariance  μ11=μxy  is related to the non-centered moment  m11=mxy=E[xy]  as follows:
μxy=mxymxmy.
  • This equation is enormously advantageous for numerical evaluations,  since  mxymx  and  my  can be found from the sequences  xv  and  yv  in a single run.
  • On the other hand,  if one were to calculate the covariance  μxy  according to the above definition equation,  one would have to find the mean values  mx  and  my  in a first run and could then only calculate the expected value  E[(xmx)(ymy)]  in a second run.


Example 4:  In the first two rows of the table,  the first elements of two random sequences  xν  and  yν  are entered.  In the last row, the respective products  xνyν  are given.

Example for two-dimensional expected values
  • By averaging over ten sequence elements in each case,  one obtains 
mx=0.5,  my=1,  mxy=0.69.
  • This directly results in the value for the covariance:
μxy=0.690.5·1=0.19.

Without knowledge of the equation  μxy=mxymxmy  one would have had to first determine the means  mx  and  my  in the first run,  and then determine the covariance  μxy  as the expected value of the product of the zero mean variables in a second run.

Correlation coefficient


With statistical independence of the two components  x  and  y   the covariance  μxy0.  This case has already been considered in  Example 2  in the section  PDF and CDF for statistically independent components.

  • But the result  μxy=0  is also possible for statistically dependent components  x  and  y  namely when they are uncorrelated,  i.e.  "linearly independent".
  • The statistical dependence is then not of first order,  but of higher order,  for example corresponding to the equation  y=x2.


One speaks of  complete correlation  when the  (deterministic)  dependence between  x  and  y  is expressed by the equation  y=K·x.  Then the covariance is given by:

  • μxy=σx·σy  with positive  K  value,
  • μxy=σx·σy  with negative  K  value.


Therefore,  instead of the  "covariance"  one often uses the so-called  "correlation coefficient"  as descriptive quantity.

Definition:  The  correlation coefficient  is the quotient of the covariance  μxy  and the product of the standard deviations  σx  and  σy  of the two components:

ρxy=μxyσxσy.


The correlation coefficient  ρxy  has the following properties:

  • Because of normalization,   1ρxy+1  always holds.
  • If the two random variables  x  and  y  are uncorrelated,  then  ρxy=0.
  • For strict linear dependence between  x  and  y   ⇒   ρxy=±1   ⇒   complete correlation.
  • A positive correlation coefficient means that when  x  is larger,  on statistical average  y  is also larger than when  x  is smaller.
  • In contrast,  a negative correlation coefficient expresses that  y  becomes smaller on average as  x  increases.


Two-dimensional Gaussian PDF with correlation

Example 5:  The following conditions apply:

  1. The considered components  x  and  y  each have a Gaussian PDF.
  2. The two standard deviations are different  (σy<σx).
  3. The correlation coefficient is  ρxy=0.8.


Unlike the  Example 2  with statistically independent components   ⇒   ρxy=0  (even though  σy<σx)  one recognizes that here

  • with larger  x–value on statistical average  y  is also larger
  • than with a smaller  x–value.


Regression line


Definition:  The  regression line  – sometimes called  "correlation line" –  is the straight line  y=K(x)  in the  (x,y)–plane through the  "midpoint"  (mx,my)

Two-dimensional Gaussian PDF with regression line

The regression line has the following properties:

  • The mean square deviation from this straight line  - viewed in  y–direction and averaged over all  N  points -  is minimal:
¯ε2y=1NNν=1[yνK(xν)]2=minimum.
  • The regression line can be interpreted as a kind of  "statistical symmetry axis".  The equation of the straight line is:
y=K(x)=σyσxρxy(xmx)+my.
  • The angle taken by the regression line to the  x–axis is:
θyx=arctan (σyσxρxy).


By this nomenclature it should be made clear that we are dealing here with the regression of  y  on  x.

  • The regression in the opposite direction  – that is, from  x  to  y –  on the other hand,  means the minimization of the mean square deviation in  x–direction.
  • The  (German language)  applet  "Korrelation und Regressionsgerade"   ⇒   "Correlation Coefficient and Regression Line"  illustrates
    that in general  (if  σyσx)  for the regression of  x  on  y  will result in a different angle and thus a different regression line:
θxy=arctan (σxσyρxy).


Exercises for the chapter


Exercise 4.1: Triangular (x, y) Area

Exercise 4.1Z: Appointment to Breakfast

Exercise 4.2: Triangle Area again

Exercise 4.2Z: Correlation between "x" and "e to the Power of x"

Exercise 4.3: Algebraic and Modulo Sum

Exercise 4.3Z: Dirac-shaped 2D PDF