[ About | Licence | Contacts ]
Written by Oleksandr Gavenko (AKA gavenkoa), compiled on 2023-03-19 from rev c18d218b854e.

Probability

PMF

PMF or probability mass function or probability law or probability discribuion of discrete random variable is a function that for given number give probability of that value.

To denote PMF used notations:

PMF(X = x) = P(X = x) = pX(x) = P(ω ∈ Ω : X(ω) = x)
PMF(a ≤ X ≤ b) = P(a ≤ X ≤ b) = a ≤ x ≤ b P(X = x)
pX(x) ≥ 0
x pX(x) = 1

where X is a random variable on space Ω of outcomes which mapped to real number via X(ω).

Expected value

Expected value of PMF is:

E[X] = Σω ∈ ΩΧ(x)*p(ω) = Σxx*pX(x)

We write a ≤ X ≤ b for ω ∈ Ωa ≤ X(ω) ≤ b.

If X ≥ 0 then E[X] ≥ 0.

if a ≤ X ≤ b then a ≤ E[X] ≤ b.

If Y = g(X) (ω ∈ ΩY(ω) = g(X(ω))) then:

E[Y] = Σxg(x)*pX(x)

Proof TODO:

E[Y] = Σyy*pY(y)
 = Σy ∈ ℝyω ∈ Ω : Y(ω) = yp(ω)
 = Σy ∈ ℝyω ∈ Ω : g(X(ω)) = yp(ω)
 = Σy ∈ ℝyx ∈ ℝ : g(x) = yΣω ∈ Ω : X(ω) = xp(ω)
 = Σy ∈ ℝyx ∈ ℝ : g(x) = ypX(x)
 = Σy ∈ ℝΣx ∈ ℝ : g(x) = yy*pX(x)
 = Σx ∈ ℝΣy ∈ ℝ : g(x) = yy*pX(x)
 = Σxg(x)*pX(x)
E[a*X + b] = a*E[X] + b

Variance

Variance is a:

var[X] = E[(X − E[X])²] = E[X²] − (E[X])²

Standard deviation is a:

σΧ = sqrt(var[X])

Property:

var(a*X + b) = a²·var[X]

Total probability theorem

Let AiAj = ∅ for i ≠ j and i Ai = Ω:

pX(x) = ΣiP(AipX|Ai(x)

Conditional PMF on event

Conditional PMF on event is:

pX|A(x) = P(X = x|A)
E[X|A] = x x·pX|A(x)

Total expectation theorem

E[X] = Σi P(AiE[X|Ai]

To prove theorem just multiply total probability theorem by x.

Joint PMF

Joint PMF of random variables X1, ..., Xn is:

pX1, ..., Xn(x1, ..., xn) = P(ANDx1, ..., xn : Xi = xi)

Properties:

E[X + Y] = E[X] + E[Y]

Conditional joint PMF

Conditional joint PMF is:

pX|Y(x|y) = P(X = x|Y = y) = P(X = x&Y = y) ⁄ P(Y = y)

So:

pX, Y(x, y) = pY(ypX|Y(x|y) = pX(xpY|X(y|x)
pX, Y, Z(x, y, z) = pY(ypZ|Y(z|ypX|Y, Z(x|y, z)
x, y pX, Y|Z(x, y|z) = 1

Conditional expectation of joint PMF

Conditional expectation of joint PMF is:

E[X|Y = y] = x x·pX|Y(x|y)
E[g(X)|Y = y] = x g(xpX|Y(x|y)

Total probability theorem for joint PMF

pX(x) = y pY(ypX|Y(x|y)

Total expectation theorem for joint PMF

E[X] = y pY(yE[X|Y = y]

Proof:

y pY(yE[X|Y = y] = y pY(yx x·pX|Y(x|y)
 = y x pY(yx·pX|Y(x|y) = x y x·pY(ypX|Y(x|y)
 = x x·y pY(ypX|Y(x|y) = x x·pX(x) = E[X]

Conditional expectation of joint PMF

Conditional expectation of joint PMF is random variable E[X|Y] defined as:

E[X|Y](y) = E[X|Y = y]

Property:

E[g(YX|Y] = g(YE[X|Y]

For invertible funtion h:

E[X|h(Y)] = E[X|Y]

Proof:

E[X|Y = y] = E[X|h(Y) = h(y)]

Law of Iterated Expectations

E[E[X|Y]] = E[X]

Proof (using total expectation theorem):

E[E[X|Y]] = y E[X|Y](y) = y E[X|Y = y] = E[X]

Generalisation of Law of Iterated Expectations:

E[E[X|Y, Z]|Y] = E[X|Y]

Proof, for each y ∈ Y:

E[X|Y = y] = x x·pX|Y(x|Y = y) = x x·pX, Y(x, y) ⁄ pY(y)
 = x x·z pX, Y, Z(x, y, z) ⁄ pY(y)
 = x x·z pX|Y, Z(x|Y = y, Z = zpY, Z(y, z) ⁄ pY(y)
 = x x·z pX|Y, Z(x|Y = y, Z = zpZ|Y(z|Y = y)
 = x z x·pX|Y, Z(x|Y = y, Z = zpZ|Y(z|Y = y)
 = z x x·pX|Y, Z(x|Y = y, Z = zpZ|Y(z|Y = y)
 = z pZ|Y(z|Y = yx x·pX|Y, Z(x|Y = y, Z = z)
 = z pZ|Y(z|Y = yE[X|Y, Z] = E[E[X|Y, Z]|Y = y]

Conditional variance

Conditional variance of X on Y is r.v.:

var(X|Y)(y) = var(X|Y = y) = E[(X − E[X|Y = y])²|Y = y]

or in another notation:

var(X|Y) = E[X²|Y] − (E[X|Y])²

Law of total variance

By applying expected value by Y on both sides:

E[var(X|Y)] = E[E[X²|Y]] − E[(E[X|Y])²] = E[X²] − E[(E[X|Y])²]

on another hand:

var(E[X|Y]) = E[(E[X|Y])²] − (E[E[X|Y]])² = E[(E[X|Y])²] − (E[X])²

By adding last two expression:

E[var(X|Y)] + var(E[X|Y]) = E[X²] − (E[X])² = var(X)

So:

var(X) = E[var(X|Y)] + var(E[X|Y])

Independence of r.v.

r.v. X and Y is independent if:

x, y : pX, Y(x, y) = pX(xpY(y)

So if two r.v. are independent:

E[X·Y] = E[XE[Y]
var(X + Y) = var(X) + var(Y)

Convolution formula

If Z = X + Y and X and Y is independent r.v. then:

pZ(z) = x pX(xpY(z − x)

Proof:

pZ(z) = x, y : x + y = z pZ(z) = x, y : x + y = z P(X = x, Y = z − x)
 = x, y : x + y = z P(X = xP(Y = z − x) = x pX(xpY(z − x)

Sum of a random number of r.v

Let Xi is independent equally distributed r.v. and let Y = i = 1..N Xi, where N is r.v. Then:

E[Y|N = n] = n·E[X]
E[Y|N] = N·E[X]

Proof:

E[Y|N = n] = E[i = 1..N Xi|N = n] = E[i = 1..n Xi] = i = 1..n E[Xi] = n·E[X]

Variance of sum of a random number independent r.v.:

var(i = 1..N Xi|N) = E[Nvar(X) + (E[X])²·var(N)

Proof:

var(Y|N = n) = var[i = 1..N Xi|N = n] = var[i = 1..n Xi] = i = 1..n var[Xi] = n·var(X)
var(Y) = E[var(Y|N)] + var(E[Y|N]) = E[Nvar(X) + (E[X])²·var(N)

Well known discrete r.v.

Bernoulli random variable

Bernoulli random variable with parameter p is a random variable that have 2 outcomes denoted as 0 and 1 with probabilities:

pX(0) = 1 − p
pX(1) = p

This random variable models a trial of experiment that result in success or failure.

Indicator of r.v. event A is function:

I_A = 1 iff A occurs, else 0
PIA = p(IA = 1) = p(A)
IA*IB = IAB
E[bernoulli(p)] = 0*(1 − p) + 1*p = p
var[bernoulli(p)] = E[bernoulli(p) − E[bernoulli(p)]]
 = (0 − p)²·(1 − p) + (1 − p)²·p = p²·(1 − p) + (1 − 2p + p²)·p
 = p² − p³ + p − 2·p² + p³ = p·(1 − p)

Discret uniform random variable

Discret uniform random variable is a variable with parameters a and b in sample space x : a ≤ x ≤ b x ∈ ℕ with equal probability of each possible outcome:

punif(a, b)(x) = 1 ⁄ (b − a + 1)
E[unif(a, b)] = Σa ≤ x ≤ bx*1 ⁄ (b − a + 1) = 1 ⁄ (b − a + 1)*Σa ≤ x ≤ bx
 = 1 ⁄ (b − a + 1)*(Σa ≤ x ≤ ba + Σ0 ≤ x ≤ b − ax)
 = 1 ⁄ (b − a + 1)*((b − a + 1)*a + (b − a)*(b − a + 1) ⁄ 2)
 = a + (b − a) ⁄ 2 = (b + a) ⁄ 2
var[unif(a, b)] = E[unif²(a, b)] − E²[unif(a, b)]
 = a ≤ x ≤ bx² ⁄ (b − a + 1) − (b + a)² ⁄ 4
 = 1 ⁄ (b − a + 1)·(0 ≤ x ≤ bx² − 0 ≤ x ≤ a − 1x²) − (b + a)² ⁄ 4
 = 1 ⁄ (b − a + 1)·(b + 3·b² + 2·b³ − (a − 1) + 3·(a − 1)² + 2·(a − 1)³) ⁄ 6 − (b + a)² ⁄ 4
 = (2·b² + 2·a·b + b + 2·a² − a) ⁄ 6 − (b + a)² ⁄ 4
 = (b − a)·(b − a + 2) ⁄ 12

Note

From Maxima:

sum(i^2,i,0,n), simpsum=true;

         2      3
  n + 3 n  + 2 n
  ---------------
        6

factor(b+3*b^2+2*b^3 - (a-1)-3*(a-1)^2-2*(a-1)^3);

                  2                  2
  (b - a + 1) (2 b  + 2 a b + b + 2 a  - a)

factor((2*b^2 + 2*a*b + b + 2*a^2 - a)/6 - (b+a)^2/4), simp=true;

  (b - a) (2 - a + b)
  -------------------
          12

Binomial random variable

Binomialrandomvariable is a r.v. with parameters n (positive integer) and p from interval (0, 1) and sample space of positive integers from inclusive region [0, n]:

pbinom(n, p)(x) = n! ⁄ (x!*(n − x)!)pxpn − x

Binomial random variable models a number of success of n independent trails of Bernoulli experimants.

E[binom(n, p)] = E[1 ≤ x ≤ nbernoulli(p)] = 1 ≤ x ≤ nE[bernoulli(p)] = n·p
var[binom(n, p)] = var[1 ≤ x ≤ nbernoulli(p)] = 1 ≤ x ≤ nvar[bernoulli(p)] = n·p·(1 − p)

Geometric random variable

Geometric random variable is a r.v. with parameter p from half open interval (0, 1], sample space is all positive numbers:

pgeom(p)(x) = p(1 − p)(x − 1)

This random variable models number of tosses of biased coin until first success.

E[geom(p)] = x = 1..∞x·p·(1 − p)(x − 1)
 = p·x = 1..∞x·(1 − p)(x − 1)
 = p ⁄ (1 − px = 0..∞x·(1 − p)x
 = p ⁄ (1 − p)·(1 − p) ⁄ (1 − p − 1)² = p ⁄ p² = 1 ⁄ p

Note

Maxima calculation:

load("simplify_sum");
simplify_sum(sum(k * x^k, k, 0, inf));
  Is abs(x) - 1 positive, negative or zero?
  negative;
  Is x positive, negative or zero?
  positive;
  Is x - 1 positive, negative or zero?
  negative;
       x
  ------------
   2
  x  - 2 x + 1
E[(geom(p))²] = x = 1..∞x²·p·(1 − p)(x − 1)
 = p·x = 1..∞x²·(1 − p)(x − 1)
 = p ⁄ (1 − px = 0..∞x²·(1 − p)x
 = p ⁄ (1 − p)·(1 − p)·(1 − p + 1) ⁄ (1 − (1 − p))³ = p·(2 − p) ⁄ p³ = (2 − p) ⁄ p²

Note

Maxima calculation:

load("simplify_sum");
(%i3) assume(x>0);
(%o3)                               [x > 0]
(%i4) assume(x<1);
(%o4)                               [x < 1]

(%i8) simplify_sum(sum(k^2 * x^k, k, 0, inf));
                                          2
                                     x + x
(%o8)                        - -------------------
                                3      2
                               x  - 3 x  + 3 x - 1

So:

var(geom(p)) = E[(geom(p))²] − E[geom(p)]² = (2 − p) ⁄ p² − 1 ⁄ p² = (1 − p) ⁄ p²