Statistical Dependence and Independence
Contents
General definition of statistical dependence
So far we have not paid much attention to »statistical dependence« between events, even though we have already used it as in the case of two »disjoint sets«:
- If an element belongs to A,
- it cannot with certainty also be contained in the disjoint set B.
The strongest form of dependence at all is such a »deterministic dependence« between two sets or two events. Less pronounced is the statistical dependence. Let us start with its complement:
Definitions:
(1) Two events A and B are called »statistically independent«, if the probability of the intersection A∩B is equal to the product of the individual probabilities:
- Pr(A∩B)=Pr(A)⋅Pr(B).
(2) If this condition is not satisfied, then the events A and B are »statistically dependent«:
- Pr(A∩B)≠Pr(A)⋅Pr(B).
- In some applications, statistical independence is obvious, for example, in the »coin toss« experiment. The probability for »heads« or »tails« is independent of whether »heads« or »tails« occurred in the last toss.
- And also the individual results in the random experiment »throwing a roulette ball« are always statistically independent of each other under fair conditions, even if individual system players do not want to admit this.
- In other applications, on the other hand, the question whether two events are statistically independent or not is not or only very difficult to answer instinctively. Here one can only arrive at the correct answer by checking the formal independence criterion given above, as the following example will show.
Example 1: We consider the experiment »throwing two dice«, where the two dice (in graphic: "cubes") can be distinguished by their colors red (R) and blue (B). The graph illustrates this fact, where the sum S=R+B is entered in the two-dimensional field (R,B).
For the following description we define the following events:
- A1: The outcome of the red cube is R<4 (red background) ⇒ Pr(A1)=1/2,
- A2: The outcome of the blue cube is B>4 (blue font) ⇒ Pr(A2)=1/3,
- A3: The sum of the two cubes is S=7 (green outline) ⇒ Pr(A3)=1/6,
- A4: The sum of the two cubes is S=8 ⇒ Pr(A4)=5/36,
- A5: The sum of the two cubes is S=10 ⇒ Pr(A5)=3/36.
The graph can be interpreted as follows:
- The two events A1 and A2 are statistically independent because the probability Pr(A1∩A2)=1/6 of the intersection is equal to the product of the two individual probabilities Pr(A1)=1/2 and Pr(A2)=1/3 . Given the problem definition, any other result would also have been very surprising.
- But also the events A1 and A3 are statistically independent because of Pr(A1)=1/2, Pr(A3)=1/6 and Pr(A1∩A3)=1/12. The probability of intersection (1/12) arises because three of the 36 squares are both highlighted in red and outlined in green.
- In contrast, there are statistical bindings between A1 and A4 because the probability of intersection ⇒ Pr(A1∩A4)=1/18=4/72 is not equal to the product Pr(A1)⋅Pr(A4)=1/2⋅5/36=5/72 .
- The two events A1 and A5 are even disjoint ⇒ Pr(A1∩A5)=0: none of the boxes with red background is labeled S=10 .
This example shows that disjunctivity is a particularly pronounced form of statistical dependence.
Conditional probability
If there are statistical bindings between the two events A and B, the (unconditional) probabilities Pr(A) and Pr(B) do not describe the situation unambiguously in the statistical sense. So-called »conditional probabilities« are then required.
Definitions:
(1) The »conditional probability« of A under condition B can be calculated as follows:
- Pr(A|B)=Pr(A∩B)Pr(B).
(2) Similarly, the conditional probability of B under condition A is:
- Pr(B|A)=Pr(A∩B)Pr(A).
(3) Combining these two equations, we get Bayes' theorem:
- Pr(B|A)=Pr(A|B)⋅Pr(B)Pr(A).
Below are some properties of conditional probabilities:
- Also a conditional probability lies always between 0 and 1 including these two limits: 0≤Pr(A|B)≤1.
- With constant condition B, all calculation rules given in the chapter »Set Theory Basics« for the unconditional probabilities Pr(A) and Pr(B) still apply.
- If the existing events A and B are disjoint, then Pr(A|B)=Pr(B|A)=0 (agreement: event A »exists« if Pr(A)>0).
- If B is a proper or improper subset of A, then Pr(A|B)=1.
- If two events A and B are statistically independent, their conditional probabilities are equal to the unconditional ones, as the following calculation shows:
- Pr(A|B)=Pr(A∩B)Pr(B)=Pr(A)⋅Pr(B)Pr(B)=Pr(A).
- Pr(A|B)=Pr(A∩B)Pr(B)=Pr(A)⋅Pr(B)Pr(B)=Pr(A).
Example 2: We again consider the experiment »Throwing two dice«, where S=R+B denotes the sum of the red and blue dice (cube).
Here we consider bindings between the two events
- A1: »The outcome of the red cube is R<4 « (red background) ⇒ Pr(A1)=1/2,
- A4: »The sum of the two cubes is S=8 « (green outline) ⇒ Pr(A4)=5/36,
and refer again to the event of Example 1:
- A3: »The sum of the two cubes is S=7 « ⇒ Pr(A3)=1/6.
Regarding this graph, note:
- There are statistical bindings between the both events A1 and A4, since the probability of intersection ⇒ Pr(A1∩A4)=2/36=4/72 is not equal to the product Pr(A1)⋅Pr(A4)=1/2⋅5/36=5/72.
- The conditional probability Pr(A1|A4)=2/5 can be calculated from the quotient of the »joint probability« Pr(A1∩A4)=2/36 and the absolute probability Pr(A4)=5/36.
- Since the events A1 and A4 are statistically dependent, the conditional probability Pr(A1|A4)=2/5 (two of the five squares outlined in green are highlighted in red) is not equal to the absolute probability Pr(A1)=1/2 (half of all squares are highlighted in red).
- Similarly, the conditional probability Pr(A4|A1)=2/18=1/9 (two of the 18 fields with a red background are outlined in green) is unequal to the absolute probability Pr(A4)=5/36 (a total of five of the 36 fields are outlined in green).
- This last result can also be derived using »Bayes' theorem«, for example:
- Pr(A4|A1)=Pr(A1|A4)⋅Pr(A4)Pr(A1)=2/5⋅5/361/2=1/9.
- In contrast, the following conditional probabilities hold for A1 and the statistically independent event A3, see Example 1:
- Pr(A1|A3)=Pr(A1)=1/2resp.Pr(A3|A1)=Pr(A3)=1/6.
General multiplication theorem
Furthermore, we consider several events denoted as Ai with 1≤i≤I. However, these events Ai no longer represent a »complete system« , viz:
- They are not pairwise disjoint to each other.
- There may also be statistical bindings between the individual events.
Definition:
(1) For the so-called »joint probability«, i.e. for the probability of the intersection of all I events Ai holds in this case:
- Pr(A1∩ ...∩AI)=Pr(AI)⋅Pr(AI−1|AI)⋅Pr(AI−2|AI−1∩AI)⋅ ...⋅Pr(A1|A2∩ ...∩AI).
(2) In the same way holds:
- Pr(A1∩ ...∩AI)=Pr(A1)⋅Pr(A2|A1)⋅Pr(A3|A1∩A2)⋅ ...⋅Pr(AI|A1∩ ...∩AI−1).
Example 3: A lottery drum contains ten lots, including three hits (event T1).
- Then the probability of drawing two hits with two tickets is:
- Pr(T1∩T2)=Pr(T1)⋅Pr(T2|T1)=3/10⋅2/9=1/15≈6.7%.
- This takes into account that in the second draw (event T2) there would be only nine tickets and two hits in the drum if one hit had been drawn in the first run:
- Pr(T2|T1)=2/9≈22.2%.
- However, if the tickets were returned to the drum after the draw, the events T1 and T2 would be statistically independent and it would hold:
- Pr(T1∩T2)=(3/10)2=9%.
Inference probability
Given again events Ai with 1≤i≤I that form a »complete system«. That is:
- All events are pairwise disjoint (Ai∩Aj=ϕ for all i≠j ).
- The union gives the universal set:
- I⋃i=1Ai=G.
Besides, we consider the event B, of which all conditional probabilities Pr(B|Ai) with indices 1≤i≤I are known.
Theorem of total probability: Under the above conditions, the »unconditional probability« of event B is:
- Pr(B)=I∑i=1Pr(B∩Ai)=I∑i=1Pr(B|Ai)⋅Pr(Ai).
Definition: From this equation, using Bayes' theorem: ⇒ »Inference probability«:
- Pr(Ai|B)=Pr(B∣Ai)⋅Pr(Ai)Pr(B)=Pr(B|Ai)⋅Pr(Ai)∑Ik=1Pr(B|Ak)⋅Pr(Ak).
Example 4: Munich's student hostels are occupied by students from
- Ludwig Maximilian Universiy of Munich (event L ⇒ Pr(L)=70%) and
- Technical University of Munich (event T ⇒ Pr(T)=30%).
It is further known that at LMU 60% of all students are female, whereas at TUM only 10% are female.
- The proportion of all female students in the hostel (event F) can then be determined using the total probability theorem:
- Pr(F)=Pr(F|L)⋅Pr(L)+Pr(F|T)⋅Pr(T)=0.6⋅0.7+0.1⋅0.3=45%.
- If we meet a female student, we can use the inference probability
- Pr(L∣F)=Pr(F∣L)⋅Pr(L)Pr(F∣L)⋅Pr(L)+Pr(F∣T)⋅Pr(T)=0.6⋅0.70.6⋅0.7+0.1⋅0.3=1415≈93.3%
- to predict that she will study at LMU. A quite realistic result (at least in the past).
⇒ The topic of this chapter is illustrated with examples in the (German language) learning video
- »Statistische Abhängigkeit und Unabhängigkeit« ⇒ »Statistical Dependence and Independence«.
Exercises for the chapter
Exercise 1.4: 2S/3E Channel Model
Exercise 1.4Z: Sum of Ternary Quantities
Exercise 1.5Z: Probabilities of Default