Axiomatic Probability and Conditional Probability

New Videos: Videos #5 and #6 on Probability for Actuarial Exam 1/P

Mathematics is built on an axiomatic foundation. What does this mean? In a nutshell, it means that every topic in mathematics is based on both undefined terms and on statements that are assumed to be true.

Upon hearing this, many people wonder “how can this be so?”. Isn’t it necessary to prove all mathematical facts with rigorous logic? Shouldn’t all terms be precisely defined?

A few people might try to answer these questions by embarking on a quest to rid mathematics of undefined terms and axioms. However, it is a hopeless quest. There is no way to complete the task without using circular reasoning.

Ideally, it is a good thing to keep undefined terms and axioms intuitive, simple, and minimal in number. This is somewhat true for probability, which is the main topic of interest in this post. However, there is a caveat that the reader should be comfortable with sets, functions, and even with the basic motivations for, interpretations of, and examples in, probability.

The problem-solving content of Videos #5 and #6 in my series on Probability for Actuarial Exam 1/P use some fundamental theorems that can be proved using the axioms of probability. The videos are directly below. I will discuss the axioms of probability, as well as foundational theorems and their proofs, further below.

Actuarial Exam 1/P Prep: Tricky Venn Diagram Problem, Conditional Probability, General Multiplication Rule. Probability for Actuarial Exam 1/P, Video #5. Sample Exam P Questions, Problem #9.
Actuarial Exam 1/P Prep: Conditional Probability From Two Perspectives with a Venn Diagram. Probability for Actuarial Exam 1/P, Video #6. Sample Exam P Questions, Problem #6.

An Axiomatic Approach to Basic Probability

Description of the Axioms

I will not be as abstract as possible in the axiomatic approach to probability that I am about to describe. First, I will assume the reader is familiar with the (informal) ideas of sets and functions. Essentially I am taking these to be undefined terms, though functions are informally described as “rules of assignment” and sets as “collections of objects”.

Second, I am also going to assume the reader is familiar with the idea of a sample space and of events from a random experiment, both as intuitive ideas and as sets. These are yet more undefined terms.

Finally, I am assuming that the events in this sample space S have simple relationships between each other. To be more precise, I am assuming that the (set) complement A'=S-A of an event is another event and that a “countable” union A_{1}\cup A_{2}\cup A_{3}\cup \cdots of events A_{1},A_{2},A_{3},\ldots, is another event. You will be happy to know that I will not get into an abstract discussion of “sigma-algebras“.

The axioms for basic probability can now be described as follows. We start by assuming there is a “probability set function” P. The domain of P is the set (collection) of all possible events. The codomain of P is initially taken to be the interval [0,\infty) (later we will prove that the codomain of P can actually be taken to be the interval [0,1]). The output of P for an arbitrary event A will be denoted by P[A]. Furthermore, P is assumed to satisfy the following properties (axioms):

  1. P[S]=1=100\%, i.e., something in the sample space S must occur.
  2. (Non-negativity) P[A]\geq 0 for any event A, i.e., negative probabilities do not make sense.
  3. (Additivity) For any finite or “countably infinite” collection of events \{A_{i}|i=1,2,3,\ldots\} such that A_{i}\cap A_{j}=\emptyset whenever i\not=j (so the events are pairwise “disjoint” or “mutually exclusive”), we have P[A_{1}\cup A_{2}\cup A_{3}\cup \cdots]=P[A_{1}]+P[A_{2}]+P[A_{3}]+\cdots. Note that we are implicitly assuming any such infinite sum (series) converges.

From these assumed truths, we can now prove some basic properties (proved truths).

Basic Consequences of the Axioms

The first fact to prove says there is no chance that nothing will happen. Seriously!

Theorem 1: P[\emptyset]=0.

Proof: Note that S=S\cup \emptyset and S\cap \emptyset=\emptyset. Therefore, by axioms (1) and (3), 1=P[S]=P[S\cup \emptyset]=P[S]+P[\emptyset]=1+P[\emptyset]. Canceling the “1” from both sides allows us to conclude that P[\emptyset]=0. Q.E.D.

The next fact is the complement law. It says, for instance, that if the chance of rain tomorrow is 30%, then the chance of no rain tomorrow is 70%.

Theorem 2: For any event A, we have P[A']=1-P[A].

Proof: Note that S=A\cup A' and that A\cap A'=\emptyset. By axioms (1) and (3), 1=P[S]=P[A\cup A']=P[A]+P[A']. Subtracting P[A] from both sides leads to the final result that P[A']=1-P[A]. Q.E.D.

The next fact shows that the codomain of P can be taken to be the interval [0,1].

Theorem 3: For any event A, we can say that 0\leq P[A]\leq 1.

Proof: First, by axiom (2), for any event A, we directly have 0\leq P[A]. Next, by Theorem 2 that we just proved, since (A')'=A, we can say that P[A]=P[(A')']=1-P[A']. But since A' is another event, we know by axiom (2) that P[A']\geq 0, which means that -P[A']\leq 0. But this helps us see that P[A]=1-P[A']\leq 1-0=1. Hence, 0\leq P[A]\leq 1, and the result follows because A is an arbitrary event. Q.E.D.

The following theorem is sometimes described as saying that P is “monotone increasing”: as the events get “bigger” (under a certain stipulation), the probabilities get bigger.

Theorem 4: For any two events A and B with A\subseteq B, it follows that P[A]\leq P[B].

Proof: First note that, since A\subseteq B, we can say that B=A\cup (B-A), where B-A=\{x\in B|x\not\in A\} (you should try verifying this on your own). We also know that A\cap (B-A)=\emptyset. Therefore, axiom (3) implies that P[B]=P[A]+P[B-A]. But P[B-A]\geq 0, by axiom (2). Therefore, P[B]\geq P[A]+0=P[A]. This is what we wanted to prove. Q.E.D.

Also note that the proof of Theorem 4 really leads to the truth of another theorem.

Theorem 5: For any two events A and B with A\subseteq B, we can conclude that P[B-A]=P[B]-P[A].

In the general case where neither A nor B are necessarily a subset of the other, the following theorem can be stated and proved.

Theorem 6: If A and B are any two events, then P[B-A]=P[B]-P[A\cap B] and P[A-B]=P[A]-P[A\cap B].

Proof: We just prove the first equation. The second one is symmetric. Note that B=(B-A)\cup (A\cap B), where (B-A)\cap (A\cap B)=\emptyset. Then, from axiom (3), P[B]=P[B-A]+P[A\cap B]. Hence, P[B-A]=P[B]-P[A\cap B]. Q.E.D.

Finally, we prove the general addition rule.

Theorem 7: For any two events A and B, the following formula is true: P[A\cup B]=P[A]+P[B]-P[A\cap B].

Proof: Start by noting that A\cup B=(A-B)\cup (A\cap B)\cup (B-A), where these three sets are all pairwise disjoint (mutually exclusive). Therefore, by axiom (3), P[A\cup B]=P[A-B]+P[A\cap B]+P[B-A]. But now Theorem 6 above gives

P[A\cup B]=P[A]-P[A\cap B]+P[A\cap B]+P[B]-P[A\cap B]=P[A]+P[B]-P[A\cap B].

We are done. Q.E.D.

When A\cap B=\emptyset, Theorems 1 and 7 imply the truth of the “special addition rule”, that P[A\cap B]=P[A]+P[B].

Conditional Probability and the General Multiplication Rule

There are many situations where knowing more information will cause a person to change their estimate of the likelihood of an event.

For example, if you live in Minnesota and are wondering about the chances of rain tomorrow, a look at the current radar in South Dakota can help you revise an initial probability estimate.

As another example, consider taking one card at random from a well-shuffled standard 52-card deck. The probability that it is a “heart” is \frac{13}{52}=\frac{1}{4}.

But if someone you trust tells you that the card is “red”, that information will cause you to change your estimate of the previous probability to \frac{13}{26}=\frac{1}{2}. This last probability is “conditioned” on the fact that you know the card is red.

Let us consider this same example in the context of axiomatic probability. The sample space S can be taken to be the 52-element set of all the distinct cards. The event H that the card is a “heart” is a 13-element set consisting of the distinct cards that are hearts. And the event R that the card is “red” is a 26-element set consisting of the distinct cards that are reds (hearts and diamonds).

The last calculation we did for the conditional probability of the card being a “heart” when it is know to be “red” could be rewritten as \frac{P[H]}{P[R]}=\frac{13/52}{26/52}=\frac{13}{26}=\frac{1}{2}.

Since H\subseteq R, we can say that P[H]=P[H\cap R], so the last calculation can also be represented as \frac{P[H\cap R]}{P[R]}. It is this last expression that is the “true” formula for the conditional probability, it works even when neither event is a subset of the other. The only restriction on this formula is that we cannot divide by zero, so the “known” event cannot have probability zero.

Notationally, the symbol P[H|R] represents the conditional probability of the event H occurring if it is known, or given, that R has occurred. Think of the vertical line as being shorthand for “given that”.

With these conventions, our definitional formula for conditional probability becomes P[H|R]=\frac{P[H\cap R]}{P[R]}.

As long as P[R]\not=0, we can multiply both sides of this formula by P[R] to get P[H\cap R]=P[R]\cdot P[H|R]. The order of the symbols is arbitrary, so we also can say that P[H\cap R]=P[H]\cdot P[R|H].

These equations are actually true in general, even when P[R]=0 and/or P[H]=0. And the general statement of their truth is called the general multiplication rule.

We will see that all these equations are very useful in future problems. They also lead to a definition of the very important idea of independent events and the corresponding (special) multiplication rule (for independent events).