Cumulative Distribution Function

From LNTwww

Relationship between PDF and CDF


To describe random variables,  in addition to the  »probability density function»  $\rm (PDF)$,  we use the  »cumulative distribution function«  $\rm (CDF)$  which is defined as follows:

$\text{Definition:}$  The  »cumulative distribution function«  $F_{x}(r)$  corresponds to the probability that the random variable  $x$  is less than or equal to a real number  $r$:

$$F_{x}(r) = {\rm Pr}( x \le r).$$


For a value-continuous random variable,  the following statements are possible regarding the CDF:

  • The CDF is computable from the probability density function  $f_{x}(x)$  by integration.  It holds:
$$F_{x}(r) = \int_{-\infty}^{r}f_x(x)\,{\rm d}x.$$
  • Since the PDF is never negative,  $F_{x}(r)$  increases at least weakly monotonically,  and the function always lies between the following limits:
$$F_{x}(r → \hspace{0.05cm} - \hspace{0.05cm} ∞) = 0, \hspace{0.5cm}F_{x}(r → +∞) = 1.$$
  • Inversely,  the probability density function can be determined from the CDF by differentiation:
$$f_{x}(x)=\frac{{\rm d} F_{x}(r)}{{\rm d} r}\Bigg |_{\hspace{0.1cm}r=x}.$$
The addition  »$r = x$«  makes it clear that in our nomenclature the PDF argument is the random variable  $x$  itself, while the CDF argument specifies any real variable  $r$ .


$\text{Notes on nomenclature:}$  If in the definitions of  $\rm PDF$  and  $\rm CDF$  we had distinguished

  • between the random variable  $X$ 
  • and the realizations  $x ∈ X$    ⇒   $f_{X}(x), F_{X}(x)$,


we would have the following nomenclature:

$$F_{X}(x) = {\rm Pr}(X \le x) = \int_{-\infty}^{x}f_{x}(\xi)\,{\rm d}\xi.$$

Unfortunately,  at the beginning of our  $\rm LNTwww$ project  $(2001)$  we decided to use our nomenclature for quite legitimate reasons,  which now  $(2017)$  cannot be changed,  also with regard to the realized learning videos.   So we stick with  $f_{x}(x)$  instead of  $f_{X}(x)$  as well as  $F_{x}(r)$  instead of  $F_{X}(x).$

CDF for value-continuous random variables


The equations given in the last section apply only to value-continuous random variables and will be illustrated here by an example.  In the next section it will be shown that for  »value-discrete random variables«  the equations must be modified somewhat.

$\text{Example 1:}$  The left image shows the photo  »Lena«,  which is often used as a test template for image coding procedures.

PDF and CDF of a value-continuous image
  • If this image is divided into  $256 × 256$  pixels,  and the brightness is determined for each pixel,  a sequence  $〈x_ν〉$  of gray values is obtained whose length  $N = 256^2 = 65\hspace{0.06cm}536$.
  • The gray value  $x$  is a value-continuous random variable,  where the assignment to numerical values is arbitrary.  For example,  let  »black«  be characterized by  $x = 0$  and  »white«  by  $x = 1$:  The value  $x =0.5$  then characterizes a medium gray coloration.


The middle diagram shows the PDF  $f_{x}(x)$  which is also often referred to in the literature as  »gray value statistics«.

  • In the original image some gray values are preferred and the two extreme values  $x =0$  ("deep black")  or  $x =1$  ("pure white")  occur very rarely.
  • The cumulative distribution function  $F_{x}(r)$  is continuous in value and increases monotonically from  $0$  to  $1$  as the right figure shows. 
  • For  $r \approx 0$  and  $r \approx 1$  the CDF is horizontal due to the lack of PDF components.


$\text{Note:}$   Strictly speaking,  for an image that can be displayed on a computer  $($in contrast to an analog photograph$)$:

  1.   The gray value is always a discrete in value. 
  2.   However,  with large resolution of the color information  $($»color depth«$)$,  this random variable can be approximated to be continuous in value.


»   The topic of this chapter is illustrated with examples in the  (German language)  learning video 
        »Zusammenhang zwischen WDF und VTF»  $\Rightarrow$ »Relationship between PDF and CDF«.


CDF for value-discrete random variables


For the CDF calculation of a value-discrete random variable  $x$  from its PDF,  a more general equation must always be assumed.  Here,  with the auxiliary variable  $\varepsilon > 0$:

$$F_{x}(r)=\lim_{\varepsilon\hspace{0.05cm}\to \hspace{0.05cm}0}\int_{-\infty}^{r+\varepsilon}f_x(x)\,{\rm d}x.$$
  • Due to the  »less than/equal»  sign in the  »general definition«, a limit value must be formed for the CDF calculation.  If we also take into account that,  for a value-discrete random variable,  the PDF consists of a sum of weighted  »Dirac delta functions«,  we obtain:
$$F_{x}(r)=\lim_{\varepsilon\hspace{0.05cm}\to \hspace{0.05cm} 0}\int_{-\infty}^{r+\varepsilon}\sum\limits_{\mu= 1}^{ M}p_\mu\cdot \delta(x-x_\mu)\,{\rm d}x.$$
  • If we interchange integration and summation in this equation,  and consider that an integration over the Dirac delta function yields the step function,  we obtain:
$$F_{x}(r)=\sum\limits_{\mu= \rm 1}^{\it M}p_\mu\cdot \gamma_0 (r-x_\mu),\hspace{0.4cm}{\rm with} \hspace{0.4cm}\gamma_0(x)=\lim_{\epsilon\hspace{0.05cm}\to \hspace{0.05cm} 0}\int_{-\infty}^{x+\varepsilon}\delta (u)\,{\rm d} u = \left\{ \begin{array}{*{2}{c}} 0 \hspace{0.4cm} {\rm if}\hspace{0.1cm} x< 0,\\ 1 \hspace{0.4cm} {\rm if}\hspace{0.1cm}x\ge 0. \\ \end{array} \right.$$
The function  $γ_0(x)$  differs from the  »unit step function«  $γ(x)$  often used in systems theory in that at the jump point  $x = 0$  the right-hand side limit  $1$nbsp; is valid  $($instead of the mean value  $0.5$  between left– and right–hand side limits$)$.
  • With the above CDF definition,  the following probability equation holds for value-continuous and value-discrete random variables equally,  and of course also for  mixed random variables  with discrete and continuous parts:
$${\rm Pr}(x_{\rm u}<x \le x_{\rm o})=F_x(x_{\rm o})-F_x(x_{\rm u}).$$
  • For purely value-continuous random variables,  the  »less than«  sign and the  »less than/equal to«  sign could be substituted for each other here.
$${\rm Pr}(x_{\rm u}<x \le x_{\rm o}) ={\rm Pr}(x_{\rm u}\le x \le x_{\rm o}) ={\rm Pr}(x_{\rm u}\le x < x_{\rm o}) ={\rm Pr}(x_{\rm u}<x < x_{\rm o}).$$

$\text{Example 2:}$  If the gray value of the  »original Lena photo«  is quantized by eight levels,  so that each pixel can be represented by three bits and transmitted digitally,  the discrete random variable  $q$  is obtained.   However, due to the quantization,  a part of the image information is lost,  which is reflected in the quantized image by clearly recognizable  »contours«.

PDF and CDF of a value-discrete image
  • The associated PDF  $f_{q}(q)$  is composed of  $M = 8$  Dirac delta functions, where,  in the quantization chosen here,  the possible gray levels are assigned the values  $q_\mu = (\mu - 1)/7$  with  $\mu = 1, 2,$ ... , $8$.
  • The weights of the Dirac delta functions can be calculated from the PDF  $f_{x}(x)$  of the original image.  One obtains
$$p_\mu={\rm Pr}(q = q_\mu ) = {\rm Pr}(\frac{2\mu-\rm 3}{14}< {x} \le\frac{2\it \mu- \rm 1}{14}) $$
$$\Rightarrow \hspace{0.3cm} p_\mu={\rm Pr}(q = q_\mu ) = \int_{(2\it \mu- \rm 3)/14}^{(2\mu-1)/14}\it f_{x}{\rm (}x{\rm )}\,{\rm d}x.$$
  • For the undefined areas  $(x<0$,   $x>1)$  is to be set  $f_{x}(x) = 0$.  Since in the original image the gray levels  $x ≈0$  $($»very deep black«$)$  or  $x ≈1$  $($»almost pure white«$)$  are largely missing,  $p_1 ≈ p_8 ≈ 0$ result.
  • Thus,  only six Dirac delta functions are visible in the PDF.  The two missing Diracs at  $q = 0$  and  $q =1$  are only indicated by dots.
  • The step-shaped CDF  $F_{q}(r)$  sketched on the right thus has six points of discontinuity,  where in each case the right-hand side limit is valid.


»   The topic of this chapter is illustrated with examples in the  (German language)  learning video 
        »Zusammenhang zwischen WDF und VTF»  $\Rightarrow$ »Relationship between PDF and CDF«.


Exercises for the chapter


Exercise 3.2: CDF for Exercise 3.1

Exercise 3.2Z: Relationship between PDF and CDF