Notes 2023.09.01

Review of probability and statistics?

Recommendation for online resources: wolfram mathworld, of course. As usual, for an event E E in a sigma algebra over a sample space S S

  • 0P(E)10 \leq \probp(E) \leq 1
  • P(S)=1 \probp(S) = 1
  • P(E1E2)=P(E1)+P(E2) \probp(E_1 \cup E_2) = \probp(E_1) + \probp(E_2) if E1,E2E_1, E_2 mutually exclusive.

Monty hall problem:

Let S={0,1,2}×{}S = \{0, 1, 2\}\times \{\}. Assume we pick random door. Prize is behind door 0. Strategy 1: X={0}, X = \{0\}, then obviously P(X)=13. \probp(X) = \frac{1}{3}. Formalize later.

Random variables.

  • PDF: p(x)0p(x) \geq 0 for xS x \in S (if it exists!).
  • CDF: F(x)P(X<x)=xp(x)dxF(x) \equiv \probp(X < x) = \int_{-\infty}^{x} p(x') \intd{x'}

Therefore if we define these this way, p(x)=ddxF(x)p(x) = \der{}{x}F(x) and P(aXb)=F(b)F(a). \probp(a \leq X \leq b) = F(b)-F(a).

General properties of F(x)F(x):

  • monotonic non-decreasing,
  • 0F(x)1 0 \leq F(x) \leq 1.

Example:

  • Uniform distribution U[1,1]\mathcal{U}_{[-1, 1]}
  • pU(x)=121[1,1](x)p_{\mathcal{U}}(x) = \frac{1}{2} \boldsymbol{1}_{[-1, 1]}(x)
  • FU(x)=1xpUdx=12(max(min(x,1),1)+1)F_{\mathcal{U}}(x) = \int_{-1}^x p_{\mathcal{U}}\intd{x'} = \frac{1}{2}(\max(\min(x, 1), -1)+ 1)

Sampling in a computer:

Uniform distribution is quite easy to generate in a computer. Since F(x)F(x) monotonic, it is invertible. Then if XU[0,1]X \sim \mathcal{U}_{[0, 1]} then Y=F1(X) Y = F^{-1}(X) has PDF pp.

Bayes theroem.

P(AB)P(B)P(BA)    P(AB)P(BA)P(B)    P(AB)P(BA)P(A)P(B) \begin{align*} \probp(A \mid B)\probp(B) &\equiv \probp(B \cap A) \\ \implies \probp(A \mid B)&\equiv \frac{\probp(B \cap A)}{\probp(B)}\\ \implies \probp(A \mid B)&\equiv \frac{\probp(B \mid A) \probp(A)}{\probp(B)} \end{align*} or more generally P(AiB)=P(BAi)P(Ai)i=1nP(BAj)P(Aj) \probp(A_i \mid B) = \frac{\probp(B \mid A_i)\probp(A_i)}{\sum_{i=1}^n \probp(B \mid A_j)\probp(A_j)} subject to iAi=iAi=S\bigcup_i A_i = \bigsqcup_i A_i = S . Note that this follows from the generalization P(B)=iP(BAi)P(Ai) \probp(B) = \sum_i \probp(B \mid A_i) \probp(A_i) subject to the same partition constraint on {Aj} \{A_j\}

Useful notation: P(A)\probp(A) is a priori and P(AB)\probp(A \mid B) is a posteriori.

E.g. Suppose you pick a car out of three brands. What is the probability that it is made by a particular manufacturer? Well, suppose we know the probability of defect within populations from each manufacturer and overall. Then we can use bayes's theorem to get P(car made by fordcar is a lemon)=P(car is a lemoncar made by ford)P(car made by ford)X is manufacturerP(car is a lemoncar made by X)P(car made by X) \probp(\textrm{car made by ford} \mid \textrm{car is a lemon}) = \frac{\probp(\textrm{car is a lemon} \mid \textrm{car made by ford})\probp(\textrm{car made by ford})}{\sum_{\textrm{X is manufacturer}} \probp(\textrm{car is a lemon}\mid \textrm{car made by X}) \probp(\textrm{car made by X})} indicating that having data on whether our car is a lemon helps us know the make of our car. A bit contrived or whatever.

Note: not always clear that P(A) \probp(A) exists. E.g. Judea Pearl's book of why

Quantities in statistics

  • xˉ=1Ni=1Nxi \bar{x} = \frac{1}{N} \sum_{i=1}^N x_i and E[X]=xf(x)dx\probe[X] = \int xf(x) \intd{x}. For future reference: calculation of mean removes degree of freedom. That explains bias correction.
  • σ2=1N1i=1N(xixˉ)2 \sigma^2 = \frac{1}{N-1} \sum_{i=1}^N (x_i-\bar{x})^2 and Var(x)=E[(XE(X))2] \textrm{Var}(x) = \probe[(X-\probe(X))^2]
  • Higher moments: E[(XE[X])n] \probe[(X-\probe[X])^n] . Apparently not encountered too frequently in climate sciences.

Useful distributions:

  • Normal, log normal distribution.

Normal distribution

f(x;μ,σ)=1σ2πexp[12xμσ]2f(x;\mu,\sigma) = \frac{1}{\sigma \sqrt{2\pi}} \exp\left[-\frac{1}{2} \frac{x-\mu}{\sigma}\right]^2

Note:

  • P(X[1,1])68% \probp(X \in [-1, 1]) \approx 68\%
  • P(X[2,2])95% \probp(X \in [-2, 2]) \approx 95\%
  • P(X[3,3])99.7% \probp(X \in [-3, 3]) \approx 99.7\%

Central limit theorem: under relatively benign assumptions (finite variance), the mean of a large number of I.I.D. random variables approximates a normal distribution. E.g. coin flips.

Note: suppose velocity vN(μ,σ2),v \sim \mathcal{N}(\mu, \sigma^2), then KE=12mv2 \textrm{KE}=\frac{1}{2}m v^2 cannot follow a normal distribution.

Multivariate stats:

*Cov(x,y)=1Ni(xixˉ)(yiyˉ).\textrm{Cov}(x, y) = \frac{1}{N} \sum_{i} (x_i-\bar{x})(y_i-\bar{y}).

  • Linear correlation coefficient: r=Cov(x,y)σxσy r = \frac{\textrm{Cov}(x,y)}{\sigma_x \sigma_y} (there are biased, unbiased versions)
  • Rank (spearman's) correlation coefficient: r=16i=d2N(N21)r' = 1 - 6 \sum_i = \frac{d^2}{N(N^2-1)} where dd is defined by ranking entries in vectors x,yx, y. Then take difference of rank vectors. This is less sensitive to outliers e.g. under assumption that X,YX,Y are independent and normally distributed. A large disagreement between Linear and Spearman correlation indicates potential outliers.

Parent post: