Preliminary definitions
Probability distribution
A probability distribution is the function that assigns a probability to each event of the sample space. It can be discrete (the possible values are finite or countably infinite) or continuous (the values fill an interval).
Random variable
A random variable $X$ is the function that assigns a real number to each element of the sample space. If the values it can take are finite or countably infinite, $X$ is discrete; if there are uncountably many (filling an interval), it is continuous.
Discrete examples: number of heads when flipping 5 coins, number of aces when drawing 3 cards. Continuous examples: height of a student, lifetime of a light bulb.
Probability mass function
When $X$ is discrete, the probability mass function $f$ assigns to each value $X$ can take its probability: $f(x) = P(X = x)$.
For $f$ to be a valid probability mass function, two conditions must hold:
Example — Rolling a die
We roll a fair die. The sample space is $E = \{1, 2, 3, 4, 5, 6\}$ and $X$ is the variable "face shown".
The graph of the probability mass function is uniform: 6 bars of the same height $\tfrac{1}{6}$.
This is the so-called discrete uniform distribution: every value of $X$ has the same probability.
Continuous random variables also have associated distributions — the most famous is the normal distribution (Gauss's bell curve) — but in this course we focus on discrete distributions, and in particular the binomial.
Bernoulli trial
Bernoulli experiment
A Bernoulli experiment is a random experiment with only two possible outcomes, which we label success and failure.
We denote:
By construction, $p + q = 1$.
Example — Penalty kick
A footballer has probability $0{,}7$ of scoring a penalty. Taking the penalty is a Bernoulli experiment with:
"Scoring" is the success and "missing" the failure (which outcome is success and which failure is a labelling we choose to match the question being asked).
Binomial distribution
Definition
The binomial distribution is the distribution followed by a random variable $X$ that counts the number of successes in $n$ independent repetitions of a Bernoulli experiment with success probability $p$.
We write:
where $n$ is the number of trials and $p$ the individual success probability.
When to use the binomial
A phenomenon is well modelled by $B(n,p)$ when it satisfies the 4 conditions:
1. The same experiment is repeated $n$ times.
2. Each experiment has only two possible outcomes (success / failure).
3. The success probability $p$ is constant on each repetition.
4. The repetitions are independent of each other.
Example — Five penalties
The same footballer as in the previous example takes 5 penalties. Let $X = $ "number of penalties scored". Then:
a) What is the probability of scoring exactly 3 penalties?
This value comes out either from the formula we'll derive next, or from a binomial table.
b) What is the probability of scoring at least 3 penalties?
"At least 3" means $X \ge 3$. It can be obtained via the complement:
$P(X \le 2) = P(X{=}0) + P(X{=}1) + P(X{=}2)$ — the sum of the first three values.
Binomial formula
Probability mass function of $B(n,p)$
If $X \sim B(n,p)$, the probability of getting exactly $k$ successes in $n$ trials is:
Where $\binom{n}{k}$ is the binomial coefficient (defined below) and $q = 1 - p$.
The formula has three ingredients:
· $\binom{n}{k}$ — how many ways the $k$ successes can be placed among the $n$ positions.
· $p^{k}$ — probability of a specific sequence with $k$ successes.
· $q^{n-k}$ — probability of the $n - k$ failures.
Penalty kicks — detailed computation
With $X \sim B(5,\;0{,}7)$, let's compute $P(X = 3)$ using the formula:
Binomial coefficients
Definition
The binomial coefficient "$m$ choose $n$" is the number of $n$-element subsets that can be formed from a set with $m$ elements. We write and compute it as:
Where $m!$ is the factorial of $m$:
Examples
· $5! = 5\cdot 4\cdot 3\cdot 2\cdot 1 = 120$.
· Compute $C_{8,3} = \binom{8}{3}$:
· Compute $\binom{5}{3}$ (the one in the penalty example):
Calculation trick
To compute $\binom{m}{n}$ quickly, write the $n$ decreasing factors from $m$ in the numerator and $n!$ in the denominator:
That way you don't have to compute the whole $m!$.
On a scientific calculator they are entered as nCr (combination) and ! (factorial). Always double-check on a calculator the binomial coefficients you get in an exam.
Cumulative probabilities
With $X \sim B(5,\;0{,}7)$ we can compute any cumulative probability. Recall the formulas above:
Sanity check: the sum of all of them must equal exactly $1$. $0{,}00243 + 0{,}02835 + 0{,}13230 + 0{,}30870 + 0{,}36015 + 0{,}16807 = 1$ ✓.
Calculations with $X \sim B(5,\;0{,}7)$
a) $P(X < 3) = P(X{=}0) + P(X{=}1) + P(X{=}2)$:
b) $P(X \ge 4) = P(X{=}4) + P(X{=}5)$:
Or, via the complement: $P(X\ge 4) = 1 - P(X\le 3) = 1 - 0{,}47178 = 0{,}52822$.
c) $P(X \le 2) = P(X{=}0) + P(X{=}1) + P(X{=}2) = 0{,}16308$ (same as a)).
Watch out for the inequalities!
$P(X < 3)$ does not include the 3: it is $P(X{=}0) + P(X{=}1) + P(X{=}2)$.
$P(X \le 3)$ does include the 3: it is $P(X{=}0) + \cdots + P(X{=}3)$.
Always read the statement carefully to tell whether the inequality is strict or not.
Expectation and standard deviation
Parameters of a binomial
For a variable $X \sim B(n,p)$, the expectation (mean) and the variance have very simple formulas:
And therefore the standard deviation:
Example — The 5 penalties
With $X\sim B(5,\,0{,}7)$:
Interpretation: if the footballer took many series of 5 penalties, on average he would score 3,5 penalties, with a typical spread of about 1 penalty around that value.
Intuition
· $E(X) = np$ — makes sense: if you run $n$ trials and each has probability $p$ of success, you expect $np$ successes in total.
· $\mathrm{Var}(X) = npq$ — depends on the product $pq$, which is maximal when $p = 0{,}5$ (and $0$ when $p = 0$ or $p = 1$). In other words: the binomial is less unpredictable when $p$ is close to 0 or 1.
Exercises
Consider the function $f(x) = \dfrac{2x + 1}{6}$ with $x \in \{1, 2, 3\}$.
a) Compute $f(1)$, $f(2)$ and $f(3)$.
b) Can $f$ be a probability mass function? Why?
A probability mass function must satisfy two conditions: $0 \le f(x_i) \le 1$ for all $i$, and $\sum f(x_i) = 1$.
· $f(3) = \tfrac{7}{6} > 1$ → violates the first condition.
· Also, the sum:
It also violates the second condition.
The following table gives the probability mass function of a discrete random variable $X$. The value $P(X{=}3)$ is unknown.
| $x_i$ | 0 | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|
| $P_i$ | 0,2 | 0,2 | 0,1 | ? | 0,1 | 0,1 |
a) $P(X = 3)$.
Since all probabilities must sum to 1:
b) $P(X > 3)$.
c) $P(X < 2)$.
Cross-check via the complement: $P(X < 2) = 1 - P(X\ge 2) = 1 - (0{,}1 + 0{,}3 + 0{,}1 + 0{,}1) = 1 - 0{,}6 = 0{,}4$. ✓
d) $P(1 < X \le 5)$.
Careful: the lower bound is strict ($X > 1$), but the upper one includes the 5.
e) The mean, $E(X) = \sum x_i \cdot P_i$.
f) The standard deviation.
First the variance $\mathrm{Var}(X) = \sum (x_i - E(X))^2 \cdot P_i$:
Shortcut: $\mathrm{Var}(X) = E(X^2) - E(X)^2$. Here $E(X^2) = 0 + 0{,}2 + 0{,}4 + 2{,}7 + 1{,}6 + 2{,}5 = 7{,}4$, so $\mathrm{Var}(X) = 7{,}4 - 2{,}2^2 = 7{,}4 - 4{,}84 = 2{,}56$. ✓
Let $X \sim B(6,\;0{,}4)$. Compute:
a) $P(X = 2)$.
b) $P(X = 0)$.
c) $P(X \ge 1)$ — at least 1 success.
Via the complement, faster:
d) $E(X)$ and $\sigma$.
80 % of the students of a secondary school passed Philosophy last year. From a group of 8 students chosen at random, what is the probability that only two have failed?
a) Justify why this is a binomial distribution.
Each student: passed or failed → it's a Bernoulli trial. We repeat the same trial 8 times (one per student), with constant success probability $p = 0{,}8$ (assuming independence) → it's a binomial $B(8,\,0{,}8)$.
b) Identify the success and failure probabilities.
If "passing" is success: $p = 0{,}8$, $q = 0{,}2$.
But the question is about the number of failures; it is convenient to swap:
If "failing" is success: $p = 0{,}2$, $q = 0{,}8$.
c) Write down the probability mass function.
We have two equivalent formulations:
· $X_1 = $ "number of passes" $\sim B(8,\,0{,}8)$.
· $X_2 = $ "number of failures" $\sim B(8,\,0{,}2)$.
If two students fail, six pass. Hence $X_2 = 2$ is equivalent to $X_1 = 6$. We'll work with $X_2$, which matches the question directly.
d) Compute $P(\text{only two fail})$.
Cross-check: $P(X_1 = 6) = \binom{8}{6}\, 0{,}8^{6}\, 0{,}2^{2} = 28\cdot 0{,}262144\cdot 0{,}04 \approx 0{,}2936$. ✓ They match, as expected.
A production line makes parts that have a 5 % probability of being defective. We pick 10 parts at random and inspect them.
a) Identify the distribution and its parameters.
Let $X = $ "number of defective parts". Then $X \sim B(10,\,0{,}05)$.
b) $P(\text{no defective parts})$.
Nearly 60 % of the lots of 10 parts have no defective part at all.
c) $P(\text{exactly 2 defective parts})$.
d) $P(\text{at least 1 defective})$.
Via the complement:
e) $E(X)$ and $\sigma$.
On average there is half a defective part per lot of 10. The standard deviation is larger than the mean — quite normal when $p$ is small.
A test has 20 questions with 4 options each, only one correct. A student answers at random, without having studied.
a) Identify the distribution and its parameters.
Let $X = $ "number of correct answers". On each question, $p = \tfrac{1}{4} = 0{,}25$ (guessing) and $q = 0{,}75$. Hence $X \sim B(20,\,0{,}25)$.
b) How many correct answers will the student get on average?
c) What is the standard deviation?
d) $P(X = 10)$ — exactly 10 correct (half).
Under 1 %: the probability of getting half the questions right by pure guessing is almost negligible — that's where the intuition "if I answer everything at random I'll definitely fail" comes from.