Notes 2023.09.01

Review of probability and statistics?

Recommendation for online resources: wolfram mathworld, of course. As usual, for an event $E$ in a sigma algebra over a sample space $S$

$0 \leq \probp(E) \leq 1$
$\probp(S) = 1$
$\probp(E_1 \cup E_2) = \probp(E_1) + \probp(E_2)$ if $E_1, E_2$ mutually exclusive.

Monty hall problem:

Let $S = \{0, 1, 2\}\times \{\}$ . Assume we pick random door. Prize is behind door 0. Strategy 1: $X = \{0\},$ then obviously $\probp(X) = \frac{1}{3}.$ Formalize later.

Random variables.

PDF: $p(x) \geq 0$ for $x \in S$ (if it exists!).
CDF: $F(x) \equiv \probp(X < x) = \int_{-\infty}^{x} p(x') \intd{x'}$

Therefore if we define these this way, $p(x) = \der{}{x}F(x)$ and $\probp(a \leq X \leq b) = F(b)-F(a).$

General properties of $F(x)$ :

monotonic non-decreasing,
$0 \leq F(x) \leq 1$ .

Example:

Uniform distribution $\mathcal{U}_{[-1, 1]}$
$p_{\mathcal{U}}(x) = \frac{1}{2} \boldsymbol{1}_{[-1, 1]}(x)$
$F_{\mathcal{U}}(x) = \int_{-1}^x p_{\mathcal{U}}\intd{x'} = \frac{1}{2}(\max(\min(x, 1), -1)+ 1)$

Sampling in a computer:

Uniform distribution is quite easy to generate in a computer. Since $F(x)$ monotonic, it is invertible. Then if $X \sim \mathcal{U}_{[0, 1]}$ then $Y = F^{-1}(X)$ has PDF $p$ .

Bayes theroem.

$\begin{align*} \probp(A \mid B)\probp(B) &\equiv \probp(B \cap A) \\ \implies \probp(A \mid B)&\equiv \frac{\probp(B \cap A)}{\probp(B)}\\ \implies \probp(A \mid B)&\equiv \frac{\probp(B \mid A) \probp(A)}{\probp(B)} \end{align*}$ or more generally $\probp(A_i \mid B) = \frac{\probp(B \mid A_i)\probp(A_i)}{\sum_{i=1}^n \probp(B \mid A_j)\probp(A_j)}$ subject to $\bigcup_i A_i = \bigsqcup_i A_i = S$ . Note that this follows from the generalization $\probp(B) = \sum_i \probp(B \mid A_i) \probp(A_i)$ subject to the same partition constraint on $\{A_j\}$

Useful notation: $\probp(A)$ is a priori and $\probp(A \mid B)$ is a posteriori.

E.g. Suppose you pick a car out of three brands. What is the probability that it is made by a particular manufacturer? Well, suppose we know the probability of defect within populations from each manufacturer and overall. Then we can use bayes's theorem to get $\probp(\textrm{car made by ford} \mid \textrm{car is a lemon}) = \frac{\probp(\textrm{car is a lemon} \mid \textrm{car made by ford})\probp(\textrm{car made by ford})}{\sum_{\textrm{X is manufacturer}} \probp(\textrm{car is a lemon}\mid \textrm{car made by X}) \probp(\textrm{car made by X})}$ indicating that having data on whether our car is a lemon helps us know the make of our car. A bit contrived or whatever.

Note: not always clear that $\probp(A)$ exists. E.g. Judea Pearl's book of why

Quantities in statistics

$\bar{x} = \frac{1}{N} \sum_{i=1}^N x_i$ and $\probe[X] = \int xf(x) \intd{x}$ . For future reference: calculation of mean removes degree of freedom. That explains bias correction.
$\sigma^2 = \frac{1}{N-1} \sum_{i=1}^N (x_i-\bar{x})^2$ and $\textrm{Var}(x) = \probe[(X-\probe(X))^2]$
Higher moments: $\probe[(X-\probe[X])^n]$ . Apparently not encountered too frequently in climate sciences.

Useful distributions:

Normal, log normal distribution.

Normal distribution

$f(x;\mu,\sigma) = \frac{1}{\sigma \sqrt{2\pi}} \exp\left[-\frac{1}{2} \frac{x-\mu}{\sigma}\right]^2$

Note:

$\probp(X \in [-1, 1]) \approx 68\%$
$\probp(X \in [-2, 2]) \approx 95\%$
$\probp(X \in [-3, 3]) \approx 99.7\%$

Central limit theorem: under relatively benign assumptions (finite variance), the mean of a large number of I.I.D. random variables approximates a normal distribution. E.g. coin flips.

Note: suppose velocity $v \sim \mathcal{N}(\mu, \sigma^2),$ then $\textrm{KE}=\frac{1}{2}m v^2$ cannot follow a normal distribution.

Multivariate stats:

* $\textrm{Cov}(x, y) = \frac{1}{N} \sum_{i} (x_i-\bar{x})(y_i-\bar{y}).$

Linear correlation coefficient: $r = \frac{\textrm{Cov}(x,y)}{\sigma_x \sigma_y}$ (there are biased, unbiased versions)
Rank (spearman's) correlation coefficient: $r' = 1 - 6 \sum_i = \frac{d^2}{N(N^2-1)}$ where $d$ is defined by ranking entries in vectors $x, y$ . Then take difference of rank vectors. This is less sensitive to outliers e.g. under assumption that $X,Y$ are independent and normally distributed. A large disagreement between Linear and Spearman correlation indicates potential outliers.

Parent post:

CLaSP 586