statistics for atmospheric science
statistic for atmospheric science
statistic for atmospheric science
Kartei Details
Karten | 88 |
---|---|
Sprache | English |
Kategorie | Mathematik |
Stufe | Universität |
Erstellt / Aktualisiert | 21.07.2018 / 27.08.2018 |
Weblink |
https://card2brain.ch/box/20180721_statistics_for_atmospheric_science
|
Einbinden |
<iframe src="https://card2brain.ch/box/20180721_statistics_for_atmospheric_science/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Axioms of probability (Axioms of Kolmogorov)
Probability P : \(\Omega\;\rightarrow\;\ \mathbb{R} \) (the probability p is a transformation from the event space to the real numbers)
Given events A in an event space \(\Omega\), i.e., \(A\subset \Omega\) (A is a subset of Omega; Omega is a superset of A)
- \(0 \leq P(A) \leq 1\)
- \(P(\Omega)=1\)
- given \(A_i\cap A_j =\emptyset\) for \(i \neq j\), then \(P(\bigcup_iA_i)=\sum_i P(A_i)\) (If the intersection of two subsets is zero, then the probability of the union is just the sum of the probabilities of the subsets)
consequences of the Axioms of Kolmogorov
- \(P(\bar{\bar{A}})=1-P(A)\)
- \(P(\emptyset)=0\)
- if A and B are exclusive, then \(P(A\cup B)=P(A)+P(B)\)
- in general \(P(A\cup B)=P(A)+P(B)-P(A\cap B)\) (additive law of probability)
Independent events
Two events are independent when the following is valid:
\(P(A\cap B)=P(A)*P(B)\)
Conditional probability of two events
The conditional probability of an event A, given an event B is:
\(P(A|B)=P(A\cap B)/P(B)\)
if A and B are independent than:
\(P(A|B)=P(A)\)
Bayes' theorem
\(.\\P(A_j|B)=\frac{P(B|A_j)P(A_j)}{P(B)}\)
what types of random variables do exist?
- discrete: number of wet days
- continuous (not really!): temperature
- categorial: Head or tail?
Cumulative distribution function (CDF)
\(F_X(x)=P(X\leq x)\) continuous random variables
\(F_X(x)=\sum_{x_i< x}P(X=x_i)\) discrete random variables
- \(F_X\) monotonically increasing (\(0\leq F_X(x)\leq 1\))
- \(lim_{x\rightarrow -\infty}F_X(x)=0,\;\;lim_{x\rightarrow \infty}F_X(x)=1\)
- \(P(X \epsilon [a,b])=P(a\leq X\leq b)=F_X(b)-F_X(a)\)
Probability distribution function
Probability mass function (only for discrete variables!):
\(f_X(x)=P(X=x)\)
Probability density function (PDF, for continous random variables!):
\(f_X(x)=\frac{dF_X(x)}{dx}\)
proberties:
- \(f_X(x)\geq 0\)
- \(\int f_X(x)dx=1\;(cont.)\;\;\sum_{X\epsilon \Omega}f_X(x)=1\;(discrete)\)
- \(P(X\epsilon [a,b])=P(a\leq X\leq b)=F_X(b)-F_X(a)\)
Independent random variables
continuous random variables:
Random variables X and Y are independent if for any x and y:
\(P(X\leq x, Y\leq y)=P(X\leq x)P(Y\leq y)=F(x)G(y)\)
where F(x) and G(x) are the corresponding CDFs.
discrete random variables:
Random variables X and Y are independent if for any \(x_i\)and \(y_i\):
\(P(X\leq x_i,Y\leq y_j)=P(X\leq x_i)P(Y\leq y_j)\)
Define the expressions Quantile, Percentile, Median and Quartile
Percentile: quantiles expressed in percentages: The 0.2 quantile is the 20th percentile
Quartiles: are 25th and 75th percentiles
Median: is the 0.5-quantile
What is a moment?
The nth moment \(\mu_n\) of a probability density \(f_X(x)\) is defined as:
- (cont.): \(\mu_n=E(X^n)=\int x^n*f_X(x)dx\)
- (discr.): \(\mu_n=E(X^n)=\sum x^n_k * f_X(x_k)\)
The n th central moment \(\mu'_n\) of a probability density \(f_X(x)\) is defined with respect to the first moment (\(\mu\)) as
\(\mu_n'=E((X-\mu)^n)=\int (x-\mu)^n * f_X(x)dx \)
How is the expected value and the variance defined?
The expected value, also called the mean is defined as the first moment:
\(\mu=E(x)=\int x*f(x)dx \)
The expected value can be physically seen as the centroid of mass in physics.
The variance is defined as the second central moment:
\(\sigma^2=Var(x)=E((X-\mu)^2)=E(X^2)-\mu^2\)
The variance gives the spread around the expected value.
What is the fourth central moment?
Kurtosis (measure of peakness)
The kurtosis of any univariate normal distribution is 3. It is common to compare the kurtosis of a distribution to this value. Distributions with kurtosis less than 3 are said to be platykurtic, although this does not imply the distribution is "flat-topped" as sometimes reported. Rather, it means the distribution produces fewer and less extreme outliers than does the normal distribution. An example of a platykurtic distribution is the uniform distribution, which does not produce outliers.
Der Exzess gibt die Differenz der Wölbung der betrachteten Funktion zur Wölbung der Dichtefunktion einer normalverteilten Zufallsgröße an.
What is the Mode?
The mode is the value that appears most often in a set of data. For a continuous probability distribution it is the peak.
What are the probability density and the cumulative distribution function of the uniform distribution?
What is intermittency?
A signal is said to be intermittent if rare events of large magnitude are separated by long periods with events of low magnitude. Spatial intermittency implies that the signal displays localized regions with events of large magnitude, and wide areas with events of low magnitude.
- PDFs of intermittent flows are not Gaussian.
- Kurtosis is often used as a measure of intermittency (high intermittency means high kurtosis)
Can the variance be zero?
Yes, then:
- the distribution only consists of one constant
- mean, median and mode are the same
Tell me a distribution where no moments exist:
The Cauchy distribution
- expected value, variance and standard deviation do not exist since it's integrals are infinite.
Law of large numbers
Given a sequence of random variables \(X_1,X_2,...\) with mean \(\mu\) . Then it holds:
\(lim_{n\rightarrow \infty}\frac{1}{n}\sum^{n}_{i=1}X_i\rightarrow \mu\)
Central limit theorem
Given a sequence of independent and identical distributed random variables \(X_1,...,X_n\) with expected value \(\mu\) and variance \(\sigma^2\) , then the distribution of \(S_n=\frac{1}{n}(X_1+...+X_n)\)is approximately normal with mean \(\mu\) and variance \(\frac{1}{n}\sigma^2\) or,
\(\sqrt{n}\left(\frac{1}{n}\sum\limits^n_{i=1}X_i-\mu \right)\xrightarrow d\mathcal{N}(0,\sigma)\)
the \(\xrightarrow d\) reads "converges in distribution to".
How large does n to be chosen? Depend on the underlying distributions of the sample sequences.
Chebychev's inequality
For any random variable and c>0 there holds:
\(P(|X-E(X)|\geq c)\leq\frac{Var(X)}{c^2}\)
Empirical vs. theoretical quantities
Quantities estimated from a given sample are often referred to as empirical or sample quantities. \(\hat{\;\mu} \)
The corresponding true or model quantities are often referred to as the theoretical. \(\mu\)
Given a sample \(x_1,...,x_N\)of a random variable X. Consider a parameter \(\Theta\) on X, e.g., the mean \(\mu\) .
Then the estimator \(\hat{\;\Theta}\) is a function of a sample (i.e., is a statistics) of the random variable X which assigns to the sample values which distribution depend on (and are close to) \(\Theta\) .
Estimators: What are the formulas for the sample mean, sample variance (known mean and not known mean) and sample standard deviation?
sample mean: \(\bar{x}=\widehat{\mu}=\frac{1}{N}\sum\limits^{N}_{i=1}x_i\)
sample variance: \(\widehat{Var}(x)=\frac{1}{N-1}\sum\limits^{N}_{i=1}(x_i-\bar{x})^2 \)
sample variance with known \(\mu\): \(\widehat{Var}(x)=\frac{1}{N}\sum\limits^{N}_{i=1}(x_i-\bar{x})^2 \)
standard deviation: \(\widehat{s}=\sqrt{\widehat{Var}(x)}\)
Given independent random variables X and Y with expected values \(\mu_X\) and \(\mu_Y\) and variances \(\sigma^2_X\) and \(\sigma^2_Y\).
How to calculate expected value and variance if,
- \(Z=\alpha+X\)
- \(Z=\alpha X\)
- \(Z=X+Y\)
- \(Z=X*Y\)
- \(Z=\alpha+X\)
- \(\mu_Z=\alpha+\mu_X,\;\;\;\sigma_Z^2=\sigma_X^2\)
- \(Z=\alpha X\)
- \(\mu_Z=\alpha\mu_X,\;\;\;\sigma_Z^2=\alpha^2\sigma^2_X\)
- \(Z=X+Y\)
- \(\mu_Z=\mu_X+\mu_Y,\;\;\;\sigma_Z^2=\sigma^2_X+\sigma^2_Y\)
- \(Z=X*Y\)
\(\mu_Z=\mu_X*\mu_Y\)
Note the density function and the cumulative distribution of composed random variables (as X+Y or XY) is in general not easy to determine, although mean and variance can easily be determined.
What is the estimator of the probability density function?
The histogram, which contains the relative occurrence divided by the bin width.
Choice of the number of bins K for a histogram
non-trivial:
- square-root choice: \(k=\sqrt{n}\)
- Sturges' formula (assumes Gausssian): \(k=log_2n+1\)
- Rice rule \(k=ceil(2n^{1/3})\)
When is an estimator consistent?
The estimator \(\widehat{\Theta}\), as a function of the random variable X, is again a random variable. Therefore every estimator has an expected value and variance.
An estimator is called consistent if:
\(P(|\widehat{\Theta}-\Theta|>\epsilon)\rightarrow0\;\;for\;\;N\rightarrow\infty\)
for all \(\epsilon >0\)
Example: The estimator for the expected value \(\widehat{\Theta}=\widehat{\mu}\) (the sample mean) is consistent (law of the large numbers).
\(\widehat{\mu}=\frac{1}{N}\sum\limits^N_{i=1}x_i\)
What is the Mean Squared Error (MSE) and Variance of an estimator?
\(MSE(\widehat{\Theta})=E[(\widehat{\Theta}-\Theta)^2]\)
The MSE is also called as risk
\(Var(\widehat{\Theta})=E[(\widehat{\Theta}-E(\widehat{\Theta}))^2]\)
What is the bias of an estimator?
Bias:
\(B(\widehat{\Theta})=E(\widehat{\Theta})-\Theta\)
An estimator is called unbiased if and only if
\(B(\widehat{\Theta})=0\)
What is the relationship between Mean Square Error, Bias and Variance of an estimator?
\(MSE(\widehat{\Theta})=Var(\widehat{\Theta})+(B(\widehat{\Theta}))^2\)
so for unbiased estimator it holds:
\(MSE(\widehat{\Theta})=Var(\widehat{\Theta})\)
Is the sample mean and the sample variance unbiased?
- The sample mean is an unbiased estimator of the expected value
- The sample variance \(\widehat{Var}(x)=\frac{1}{N}\sum\limits^N_{i=1}(x_i-\bar{x})^2\) is an not unbiased, but asymptotically unbiased estimator
- However, the sample variance \(\widehat{Var}(x)=\frac{1}{N-1}\sum\limits^N_{i=1}(x_i-\bar{x})^2\) is an unbiased estimator.
What is the main idea for a confidence interval?
Given an estimate \(\widehat{\Theta}\) of \(\Theta\). An interval \((\widehat{\Theta}_L,\widehat{\Theta}_U)\) around \(\widehat{\Theta}\) is named a \((1-\alpha)\) confidence interval if
\(P(\Theta\in (\widehat{\Theta}_L,\widehat{\Theta}_U))=1-\alpha\)
A 95% confidence interval covers the true value in 95% of the cases.
What is the t-distribution and name one application.
t-Distribution:
\(f_X(x;\nu)=c(\nu)\left(1+\frac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}}\)
with a constant \(c(\nu)\). \(\nu\in\mathbb{N}\) is called the degree of freedom. For \(\nu\rightarrow \infty\) it converges to the \(\mathcal{N}(0,1)\) distribution.
The t-Distribution is used to estimate the mean of a normally distributed population when the sample size is small and population standard deviation is unknown.
Derive the confidence intervals for the mean of a normal distributed variable with known variance!
The empirical mean \(\bar{x}\) has a distribution \(\mathcal{N}(\mu, \frac{\sigma}{\sqrt{n}})\) while
\(Z=\sqrt{n}(\bar{x}-\mu)/\sigma\)
has distribution \(\mathcal{N}(0,1)\) .
The value z is such that \(P(-z\leq Z\leq z)=1-\alpha\)
This yields: \(P(\bar{x}-z\frac{\sigma}{\sqrt{n}}\leq\mu\leq\bar{x}+z\frac{\sigma}{\sqrt{n}})=1-\alpha\)
The confidence interval can be easily found now.
What kinds of significance testing are there?
Significance tests aim at verification of a hypothesis based on statistical data:
- Parametric tests consider hypothesis regarding parameters of the distribution
- Non-parametric tests consider hypotheses not involving paramters (e.g. distributaions are the same or different)
Steps of a significance test
- Formulate Null hypothesis and an alternative hypothesis
- Choose significance level \(\alpha\)
- Choose significance test and test statistic; clarify assumptions to be made
- Calculate Null distribution and critical value
- Calculate test statistic and/or p-value
- Decide whether Null hypothesis is rejected or not
How are significance level and critical value defined?
Significance level and critical value are defined such that:
\(P(|T|\geq t_{crit})\equiv\alpha\)
i.e., the probability that T falls in the rejection region (Q) although the \(H_0\) (null hypothesis) is true (small probability).
Then the Null hypothesis is rejected in case \(p_{obs}\leq\alpha \) or \(|t_{obs}|\geq t_{crit}\)
significance testing: What is the error of the first kind and what is the error of the second kind?
error of the first kind or \(\alpha\)-error: Rection of the Null hypothesis \(H_0\) although it is true
Probability of this error is \(P(H_0 rejected\; | \;H_0 true)=\alpha\)
error of the second kind or \(\beta \)-error: No rejection of the Null hypothesis although it is wrong
Probability of this error is \(P(H_0\;not\;rejected\;|\;H_0\;false)=\beta \)
The reduction ofthe one error leads to increase of the other, unless we can increase sample size!