Recommendation for online resources: wolfram mathworld, of course.
As usual, for an event E in a sigma algebra over a sample space S
0≤P(E)≤1
P(S)=1
P(E1∪E2)=P(E1)+P(E2) if E1,E2 mutually exclusive.
Monty hall problem:
Let S={0,1,2}×{}. Assume we pick random door. Prize is behind door 0.
Strategy 1: X={0}, then obviously P(X)=31. Formalize later.
Random variables.
PDF: p(x)≥0 for x∈S (if it exists!).
CDF: F(x)≡P(X<x)=∫−∞xp(x′)dx′
Therefore if we define these this way, p(x)=dxdF(x) and
P(a≤X≤b)=F(b)−F(a).
General properties of F(x):
monotonic non-decreasing,
0≤F(x)≤1.
Example:
Uniform distribution U[−1,1]
pU(x)=211[−1,1](x)
FU(x)=∫−1xpUdx′=21(max(min(x,1),−1)+1)
Sampling in a computer:
Uniform distribution is quite easy to generate in a computer. Since F(x) monotonic, it is invertible.
Then if X∼U[0,1] then Y=F−1(X) has PDF p.
Bayes theroem.
P(A∣B)P(B)⟹P(A∣B)⟹P(A∣B)≡P(B∩A)≡P(B)P(B∩A)≡P(B)P(B∣A)P(A)
or more generally
P(Ai∣B)=∑i=1nP(B∣Aj)P(Aj)P(B∣Ai)P(Ai) subject to ⋃iAi=⨆iAi=S.
Note that this follows from the generalization P(B)=∑iP(B∣Ai)P(Ai) subject to the same partition constraint on {Aj}
Useful notation: P(A) is a priori and P(A∣B) is a posteriori.
E.g. Suppose you pick a car out of three brands. What is the probability that it is made by a particular manufacturer?
Well, suppose we know the probability of defect within populations from each manufacturer and overall. Then we can use bayes's theorem to get
P(car made by ford∣car is a lemon)=∑X is manufacturerP(car is a lemon∣car made by X)P(car made by X)P(car is a lemon∣car made by ford)P(car made by ford)
indicating that having data on whether our car is a lemon helps us know the make of our car. A bit contrived or whatever.
Note: not always clear that P(A) exists. E.g. Judea Pearl's book of why
Quantities in statistics
xˉ=N1∑i=1Nxi and E[X]=∫xf(x)dx. For future reference: calculation of mean removes degree of freedom. That explains bias correction.
σ2=N−11∑i=1N(xi−xˉ)2 and Var(x)=E[(X−E(X))2]
Higher moments: E[(X−E[X])n]. Apparently not encountered too frequently in climate sciences.
Useful distributions:
Normal, log normal distribution.
Normal distribution
f(x;μ,σ)=σ2π1exp[−21σx−μ]2
Note:
P(X∈[−1,1])≈68%
P(X∈[−2,2])≈95%
P(X∈[−3,3])≈99.7%
Central limit theorem: under relatively benign assumptions (finite variance),
the mean of a large number of I.I.D. random variables approximates a normal distribution. E.g. coin flips.
Note: suppose velocity v∼N(μ,σ2), then KE=21mv2 cannot follow a normal distribution.
Multivariate stats:
*Cov(x,y)=N1∑i(xi−xˉ)(yi−yˉ).
Linear correlation coefficient: r=σxσyCov(x,y) (there are biased, unbiased versions)
Rank (spearman's) correlation coefficient: r′=1−6∑i=N(N2−1)d2 where d is defined by ranking entries in vectors x,y. Then take difference of rank vectors.
This is less sensitive to outliers e.g. under assumption that X,Y are independent and normally distributed. A large disagreement between Linear and Spearman correlation indicates potential outliers.