# From Random Experiment to Random Variable

## # OVERVIEW OF THE SECOND MAIN CHAPTER #

This chapter is intended to familiarize you with  »discrete random variables«  assuming them to be statistically independent.  Such random variables are needed in Communications Engineering,  for example,  for the simulation of a binary or multi-level digital signal,  but equally for the emulation of a channel with statistically independent errors by a digital model,  for example,  the BSC model.

In detail, it covers:

1. the  »relationship between probability and relative frequency«,
2. the  »expected values and moments«,
3. the  »binomial and Poisson distributions«  as special cases of discrete distributions,
4. the  »generation of pseudo-random binary symbols«  using  »PN generators«,  and
5. the  »generation of multi-level random variables«  on a digital computer.

## On the concept of random variables

In the first chapter of this book,  the term  »$\text{random experiment}$«  has already been explained.  By this is meant an experiment

• that can be repeated any number of times under the same conditions with an uncertain outcome  $E$,
• but in which the set  $\{E_μ \}$  of possible outcomes is specifiable.

Often the experimental outcomes are numerical values,  as for example in the random experiment  »throwing a die«.  In contrast,  the experiment  »coin toss«  yields the possible outcomes  »heads«  and  »tails».

For a uniform description of different kinds of experiments and also because of the better manageability one uses the term  »random variable«.

Definition of a random variable

$\text{Definition:}$

• A  »random variable«  $z$  is a one-to-one mapping of the outcome set  $\{E_μ \}$  onto the set of real numbers.

• Complementary to this definition, it is still allowed that the random variable has a unit in addition to the numerical value.

Some examples of random variables are given below:

1. In the random experiment  »throwing a roulette ball»,  a distinction between  $E$  and  $z$  has no practical implications,  but may be useful for formal reasons.  Thus,  $E_μ = 8$  denotes that the ball has come to rest in the spot of the roulette wheel marked  »$8$«.  Arithmetic operations  $($e.g. an expected value formation$)$  are not possible on the basis of the outcomes.     In contrast,  the random variable  $z$  actually denotes a numerical value  $($here integer between  $0$  and  $36)$,  from which the expected value of the random variable  $($here  $18)$  can be determined.  Quite possible but not useful would be,  for example,  the assignment  $E_μ = 8$    ⇔   $z_μ ≠ 8$.
2. In the  »coin toss«  experiment,  the possible outcomes are  »heads«  and  »tails«,  to which no arithmetic operations can be applied per se.   Only by the arbitrary but unambiguous assignment between the event set  $\{E_μ\} = \{$»heads«, »tails«$\}$  and the number set  $\{z_μ\} = \{0,\ 1\}$  a characteristic value can be given here at all.   Similarly,  however,  one could also specify the assignment  »heads«   ⇔   $1$   and   »tails«   ⇔   $0$.
3. In circuit technology,  one designates the two possible logical states of a memory cell  $($»flip-flops«$)$  according to the possible voltage levels with  $\rm L$  $($Low$)$  and  $\rm H$  $($High$)$.  We adopt these designations here also for binary symbols.  For practical work,  one usually maps these symbols back to random variables,  although this mapping is also arbitrary,  but should make sense.
4. In coding theory,  it is useful to map  $\{ \text{L, H}\}$  to  $\{0,\ 1\}$  in order to be able to use the possibilities of modulo algebra.  On the other hand,  to describe modulation with bipolar  $($»antipodal»$)$  signals,  it is better to choose the mapping  $\{ \text{L, H}\}$ ⇔ $\{-1, +1\}$.

## Continuous and discrete random variables

$\text{Definition:}$  According to the possible numerical values  $z_μ = z(E_μ)$  we distinguish between value–continuous and value–discrete random variables:

• A  »continuous random variable«  $z$  can  – at least in certain ranges –  assume infinitely many different values.  More precisely:   For such variables the set of admissible values is also uncountable.
• Examples for continuous random variables are the speed of a car  $($at appropriate driving between  $v = 0$  and  $v = 120 \ \rm km/h)$  or also the noise voltage at a communication system.  Both random variables have besides a numerical value also a unit.
• If the set  $\{z_μ\}$  is countable,  it is a  »discrete random variable«.  Usually,  the number of possible values of  $z$  is limited to  $M$.  In Communications Engineering,  $M$  is called the  »symbol set size«  $($in the sense of coding theory$)$  or the  »level number«  $($from the viewpoint of transmission engineering$)$.

First,  we restrict ourselves to discrete,  $M$ level random variables with no internal statistical bindings, which are fully characterized by the  $M$  probabilities

$$p_μ ={\rm Pr}(z = z_μ)$$

according to the section  »Some Basic Definitions«.  By definition,  the sum over all  $M$  probabilities is equal to  $1$.

In contrast,  the probability  ${\rm Pr}(z = z_μ)$  for a continuous random variable  $z$  to take on a very specific value  $z_μ$  is identically zero.   Here,  as will be described in the following chapter  »Continuous Random Variables«  we must move to the  probability density function  $\rm (PDF)$.

## Random process and random sequence

$\text{Definition:}$  A  »random process«  differs from the  »random experiment»  considered so far

• in that it yields not just only one outcome,
• but a  »temporal sequence of outcomes«.

This brings us to the  »random sequence«  $\langle z_ν\rangle$  with the following properties established for our representation:

• The index  $ν$   describes the temporal process sequence and can take values between  $1$  and  $N$.  Often such a sequence is also represented as  $N$–dimensional vector.
• At any time  $ν$   the random variable  $z_ν$  can take can take one of a total of  $M$  different values.  We use the following nomenclature for this:
$$z_\nu \in z_\mu \hspace{0.3cm} {\rm with} \hspace{0.3cm} \nu = 1,\hspace{0.1cm}\text{...} \hspace{0.1cm}, N \hspace{0.3cm} {\rm and} \hspace{0.3cm} \mu = 1,\hspace{0.1cm}\text{...} \hspace{0.1cm} , M.$$
• If the process is  »$\text{ergodic}$«,  then each random sequence  $\langle z_ν\rangle$  has the same statistical properties and can be used as a representative for the entire random process.
$${\rm Pr}(z_\nu | z_{\nu \rm{ -1}} \hspace{0.1cm}\text{...} \hspace{0.1cm}z_{\rm 1})={\rm Pr}(z_\nu).$$

More and especially more detailed information about the characterization of random processes can be found in the later chapter  »Autocorrelation Function«.

$\text{Example 1:}$  If we repeat the random experiment  »throwing a roulette ball«  ten times,  we get,  for example,  the following random sequence:

$$\langle z_ν\rangle = \langle \ 8; \ 23; \ 0; \ 17; \ 36; \ 0; \ 33; \ 11; \ 25 ; \ 5 \ \rangle.$$

At any point in time,  all random variables between  $0$  and  $36$  are nevertheless possible – regardless of the past – and also equally probable,  but this cannot be read from such a short sequence.

## Bernoulli's law of large numbers

$\text{Definitions:}$  To describe an  $M$–level discrete random variable,  one uses the following descriptive variables whose sums over  $μ = 1,\hspace{0.1cm}\text{...} \hspace{0.1cm} , M$  each yield the value  $1$:

• The  »probabilities«  $p_μ = {\rm Pr}(z = z_μ)$  provide predictions about the expected outcome of a statistical experiment and are thus so-called  »a-priori characteristics«.
• The  »relative frequencies«  $h_μ^{(N)}$  are  »a-posteriori characteristics«  and allow statistical statements to be made with respect to a previously conducted experiment.  They are determined as follows:
$$h_{\mu}^{(N)} = \frac{n_{\mu} }{N}= \frac{ {\rm number \hspace{0.15cm}of \hspace{0.15cm}experiments \hspace{0.15cm}with \hspace{0.15cm}outcome\hspace{0.15cm} }z_{\mu} } { {\rm number \hspace{0.15cm}of \hspace{0.15cm}all \hspace{0.15cm}attempts } } \hspace{1cm}(\mu=1,\hspace{0.1cm}\text{...} \hspace{0.1cm},M).$$

Only in the limiting case  $N → ∞$   the relative frequencies do coincide  exactly  with the corresponding probabilities,  at least in the statistical sense.  In contrast,  according to the  »law of large numbers«  formulated by  »$\text{Jacob Bernoulli}$«  for finite values of  $N$:

$$\rm Pr \left( \it \mid h_\mu^{(N)} - p_\mu\mid \hspace{0.1cm} \ge \varepsilon \hspace{0.1cm} \right) \le \frac{1}{\rm 4\cdot \it N\cdot \varepsilon^{\rm 2}}.$$

It also follows that for infinite random sequences  $(N → ∞)$  the relative frequencies  $h_μ^{(N)}$  and the probabilities  $p_μ$  are identical with probability  $1$ .

$\text{Example 2:}$  A binary data file consists of  $N = 10^6$  binary symbols  $($»bits«$)$, where the  »zeros«  and  »ones«  are equally probable:   $p_0 = p_1 = 0.5$.

Bernoulli's law of large numbers  $($with  $\varepsilon = 0.01)$  now states that the probability of the event  »the number of ones in the file is between  $495 \hspace{0.05cm}000$  and  $505\hspace{0.05cm}000$«   is greater than or equal to
$$1 - 1/400 = 99.75\%.$$

According to the above equation,  the probability that the relative frequency  $h_μ^{(N)}$  of an event  $E_μ$  and the associated probability  $p_μ$  differ in magnitude by more than a value  $\varepsilon$  is not greater than  $1/(4 \cdot N \cdot ε^2)$.  For a given  $\varepsilon$  and a probability to be guaranteed,  the minimum required value of  $N$  can be calculated from this.

Further, it should be noted:

1. The monotonic decay with  $N$  is valid only in the statistical sense and not for each individual realization.
2. Thus,  in the  »coin toss«  experiment,  after  $N = 1000$  tosses,  the relative frequencies of  »heads«  and  »tails«  may well be exactly equal  $50\%$  $($if  $n_{\rm heads} = n_{\rm tails} = 500$$)$  and after  $N = 2000$  tosses deviate from it again more or less strongly.
3. If several subjects perform the experiment  »coin toss«  in parallel and the relative frequency is plotted as a function of  $N$,  the result is a curve which tends to decrease,  but not monotonously.
4. If,  however,  one calculates the mean value over an infinite number of such curves,  one obtains the monotonically with  $N$  decreasing course according to the Bernouillian prediction.

⇒   This topic,  specifically the experiment of  »$\text{Karl Pearson}$«,  is dealt with in the  $($German language$)$  learning video

»Bernoullisches Gesetz der großen Zahlen»   $\Rightarrow$   Bernoulli's Law of Large Numbers.