statistics for atmospheric science
statistic for atmospheric science
statistic for atmospheric science
Kartei Details
Karten | 88 |
---|---|
Sprache | English |
Kategorie | Mathematik |
Stufe | Universität |
Erstellt / Aktualisiert | 21.07.2018 / 27.08.2018 |
Weblink |
https://card2brain.ch/box/20180721_statistics_for_atmospheric_science
|
Einbinden |
<iframe src="https://card2brain.ch/box/20180721_statistics_for_atmospheric_science/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Lernkarteien erstellen oder kopieren
Mit einem Upgrade kannst du unlimitiert Lernkarteien erstellen oder kopieren und viele Zusatzfunktionen mehr nutzen.
Melde dich an, um alle Karten zu sehen.
Periodogram: Definitions!
Given a time discrete time series \(x_i\) with \(i=1...N\) and \(t_i=i\Delta t\) .
A first estimator of the spectrum is the periodogram:
\(\widehat{S}(\omega_l)=P(\omega_l)=|X(\omega_l)|^2\)
- The highest resolved frequency (Nyquist frequency) depends on the sampling time/rate.
- The frequency resolution (and thus also the lowest resolved frequency) depends on the length of the time window.
What are the problems of the Periodogram?
- The periodogram is not a consistent estimator. For large N, the variance of the periodogram does not decrease.
- Leakage. Because any observed time series is of finite length, spectral peaks get smeared out.
- Aliasing. High frequencies above the Nyquist frequency are not sampled, but appear as artefacts in lower frequencies.
Methods of improving the estimation of the power spectral density via periodogram!
- Repeat the experiment and average the individual spectra.
- Cut the time series into subseries and average the sub periodograms. This reduces the frequency resolution!
- Smooth the periodogram with a suitable smoothing kernel ("window, taper"). Reduction of variance but increase of bias!
- Fit AR(MA) models to time series and calculate their spectra.
Windowing and Tapering: What is improved? What are the drawbacks?
The PDF of the power spectral estimator is \(\chi ^2_2\)-distributed with whose relative standard deviation is 1. This means that the frequency resolution increases, but the variance stays constant with increasing N.
Although the variance can be decreased (less random scatter) with windowing and tapering.
For a stationary time series, the periodogram of each window of data gives an independent unbiased estimate of the power spectrum and thus can be averaged to smooth the spectrum.
benefit and drawbacks:
The shorter the block length, the more blocks, the smoother spectrum but also lower frequency resolution!
The idea of windowing is best shown with a rectangular window but this is in practice not often used since two problems:
- endpoint discontinuities of the window alter low frequency-variability.
- every finite time series comes with leakage.
Both problems can be reduced with weight functions for the window which goes smoothly to zero at the endpoints.
It makes sense to overlap this windows so that all data is near to the center at some point and is accounted equally. Allthough, then the power spectrum estimates are not independent anymore and this must be considered in the uncertainty estimates of the power spectrum too!
Explain Leakage!
sea Karls Skript!
How to cope with Aliasing?
If the original time series has some power on an alias-frequency of a frequency f (i.e. the alias- frequency is higher than the Nyquist frequency), this power will appear additionally at the frequency f in the spectrum of the sample.
The only possibility to avoid this, is to filter out the high frequencies (i.e. to low pass filter the signal) before the sample is taken.
Sound and Color Spectra
- The name arises of the appearance of visible light with this spectra distribution
- Each "tone color" discussed hear follows a power law in the form: \(S(\omega)\sim f^\alpha\)
White noise:
- The spectrum of Gaussian white noise \(x_t\sim \mathcal{N}(0,\sigma^2)\) is given as \(S(\omega)=\frac{\sigma^2}{2\pi}\)
- Spectrum of white noise is constant so \(\alpha=0\)
pink noise:
- Pink noise is linear in the logarithmic scale
- \(S(\omega)\sim f^{-1}\)
red noise:
- largest variance at smallest frequencies
- \(S(\omega)\sim f^{-2}\)
blue noise:
- linear in logarithmic scale with more energy in higher frequencies:
- \(S(\omega)\sim f^1\)
violet noise:
- has more energy in higher frequencies and scales as
- \(S(\omega)\sim f^2\)
grey noise:
- grey noise contains all frequencies with equal loudness
- (white noise shows equal energy for all frequencies)
What is a stochastic process?
Let \(T\) be a subset of \([0,\infty)\). A family of random variables \(\{X_t\}_{t\in T}\), indexed by \(T\), is called a stochastic process.
EOF/Principal Component Analysis.
Target: Decreasing dimensionality of data without loosing important information.
Main Idea: To peform a linear transformation E on a data matrix X to a new data matrix A in such a way that a lot of information about variability in X gets compressed into fewer dimensions in A. The number of dimension is the same but what changed is that a few dimension carry most of the variability. So we just can skip the other dimensions.
\(A_{N\times M}=X_{M\times N}^TE_{M\times M}\)
A are the principal components (PC's); The number of PC's equals the number of stations (space)
X is the data matrix
E eigenvector matrix (contains the empirical orthogonal functions EOF's)
--------------
- The Covariance between two PCA's is always zero -> the EOF's stand orthogonal to each other.
- Each EOF comes with an eigenvalue which is a measure for the explained variance.
- The explained variance of one EOF/PC is: \(\frac{\lambda_i}{diag(\lambda)}\)
- The original data can be written as the sum of the product of the EOF's and the corresponding PCs:
\(X^T=\sum^M_{i=1}a_iE_i=AE^T\)
There are several techniques to find such a transformation matrix (eigenvector matrix); We discuessed two:
- Eigendecomposition of covariance matrix of the data
- is more intuitive
- Singular Value Decomposition (SVD)
- is more computationally effective
On what assumptions is the PCA/EOF analysis based?
following assumptions about the data matrix X:
- is multivariate normal distributed
- is not auto-correlated (seasonailty needs to be removed)
- variability is linear (The variability in the data can be expressed as a sum of the single EOFs)
- no noise
Eigendecomposition of covariance matrix
Let be \(X_{M\times N}\)the [space x time] data matrix
This method is based on the eigenvalue problem of the (MxM) covariance matrix:
\(\sum=Cov(X,X)=XX^T\) where X is the anomaly matrix (i.e. the mean of each time series is 0). Furthermore the data is detrended an deseasonalized.
The Eigenvalue problem is given by \(\sum E=\lambda E\)
\(\sum ...Covariance\;matrix\\ E...eigenvector\;matrix\;(each\;column\;represents\;an\;M\times1\;EOF)\\ \lambda...is\;the\;M\times M\;eigenvalue\;matrix\)
In general, the covariance matrix has no zero entries. On the other hand the eigenvalue matrix has only non-zero entries at his diagonal. This means with help of the EOF'S (the transformation matrix), variability got redristibuted to fewer dimensions, that way, that the Covariance of it's PC's are zero -> The EOF's are orthogonal to each other!
------
What Truncation Criterias for the relevant EOFs do you know?
- By explained variance
- Only take the first K EOFs that fulfill \(\sum^K_{i=1}\lambda^2_i\geq\lambda^2_{crit}\) with \(\lambda^2_{crit}\) between 70% and 90%
- By slope in eigenvalue plot
- Find the point, that seperates steep and shallow slope and take the EOFs until this point
- By \(log(\lambda)\) plot
- Look out for the point from which the eigenvalues decay exponentially, visible as a straight line in log plot (indicates uncorrelated noise)
- By Kaiser's rule
- Retain \(\lambda_m\) if \(\lambda _m>T\frac{1}{M}\sum^{M}_{i=1}\lambda_i\), suggested value for T is 0.7
- By North's rule of thumb
- If the distance between two eigenvalues is smaller than two estimated standard errors i.e. \(\Delta \lambda<2\lambda\sqrt{\frac{2}{n}}\), the corresponding EOF's are considered to be not well seperated from each other (the true eigenvector could be a mixture of both)
How could you test data for normality as a preperation for PCA?
- Kolmogorov-Smirnov test: Compare the empirical CDF to the CDF of a normal distribution.
- Lilliefors test: Compare the empirical CDF to the CDF of a normal distribution and estimate the parameters by Monte Carlo simulations.
- Jarque-Bera test: Cheack wether the sample data has a skewness and kurtosis (3th and 4th central moment) comparable to that of a normal distribution.
Why do we have to think about PCA critical?
- EOF pattern do not necessarily correspond to physical modes. That's also the reason why we do often not discuss higher modes than the second.
- PCs and EOFs calculated from observed data are only estimations of the true PCs and EOFs associated with the true random vector X. Thats why the orthogonal functions are called empirical -> EOF!
Bootstrapping
Bootstrapping
Main Idea: Often measueres such as the standard error and confidence limits are not avaiable for small samples. Bootstrapping resembles the sample and computes the estimate for each sample. By taking many resamples we get a spread of the resampled estimate.
Generally: Drawing random samples (choosing the elemnts randomly) from a population can be done with replacement and without replacement. If we take a small sample from a large distribution, it does not matter whether the element is replaced or not.
Empirical distribution
The empirical distribution is the distribution of the data sample, which may or may not reflect the true distribution of the population.
Resampling
To resample is to take a sample from the empirical distribution with replacement.
Empirical bootstrap
For a sample \(x_1,...,x_n\) drawn from a distribution F of the population, the empirical bootstrap sample is the resampled data set of the same size \(x_1^*,...,x_m^*\) drwan from the empirical distribution \(F^*\)of the sample.
Similary we can compute any statistics \(\Theta\) from the original sample also from the empirical bootstrap sample and call it \(\Theta^*\).
The bootstrap principle states that \(F^*\simeq F\), thus the variation of \(\Theta\) is well approximated by the variation of \(\Theta^*\).
-> We can approximate the variation of \(\Theta\) by the variation of \(\Theta^*\), e.g. to estimate the confidence interval of \(\Theta\).
Axioms of probability (Axioms of Kolmogorov)
Probability P : \(\Omega\;\rightarrow\;\ \mathbb{R} \) (the probability p is a transformation from the event space to the real numbers)
Given events A in an event space \(\Omega\), i.e., \(A\subset \Omega\) (A is a subset of Omega; Omega is a superset of A)
- \(0 \leq P(A) \leq 1\)
- \(P(\Omega)=1\)
- given \(A_i\cap A_j =\emptyset\) for \(i \neq j\), then \(P(\bigcup_iA_i)=\sum_i P(A_i)\) (If the intersection of two subsets is zero, then the probability of the union is just the sum of the probabilities of the subsets)
consequences of the Axioms of Kolmogorov
- \(P(\bar{\bar{A}})=1-P(A)\)
- \(P(\emptyset)=0\)
- if A and B are exclusive, then \(P(A\cup B)=P(A)+P(B)\)
- in general \(P(A\cup B)=P(A)+P(B)-P(A\cap B)\) (additive law of probability)
Independent events
Two events are independent when the following is valid:
\(P(A\cap B)=P(A)*P(B)\)
Conditional probability of two events
The conditional probability of an event A, given an event B is:
\(P(A|B)=P(A\cap B)/P(B)\)
if A and B are independent than:
\(P(A|B)=P(A)\)
Bayes' theorem
\(.\\P(A_j|B)=\frac{P(B|A_j)P(A_j)}{P(B)}\)
what types of random variables do exist?
- discrete: number of wet days
- continuous (not really!): temperature
- categorial: Head or tail?
Cumulative distribution function (CDF)
\(F_X(x)=P(X\leq x)\) continuous random variables
\(F_X(x)=\sum_{x_i< x}P(X=x_i)\) discrete random variables
- \(F_X\) monotonically increasing (\(0\leq F_X(x)\leq 1\))
- \(lim_{x\rightarrow -\infty}F_X(x)=0,\;\;lim_{x\rightarrow \infty}F_X(x)=1\)
- \(P(X \epsilon [a,b])=P(a\leq X\leq b)=F_X(b)-F_X(a)\)
Probability distribution function
Probability mass function (only for discrete variables!):
\(f_X(x)=P(X=x)\)
Probability density function (PDF, for continous random variables!):
\(f_X(x)=\frac{dF_X(x)}{dx}\)
proberties:
- \(f_X(x)\geq 0\)
- \(\int f_X(x)dx=1\;(cont.)\;\;\sum_{X\epsilon \Omega}f_X(x)=1\;(discrete)\)
- \(P(X\epsilon [a,b])=P(a\leq X\leq b)=F_X(b)-F_X(a)\)
Independent random variables
continuous random variables:
Random variables X and Y are independent if for any x and y:
\(P(X\leq x, Y\leq y)=P(X\leq x)P(Y\leq y)=F(x)G(y)\)
where F(x) and G(x) are the corresponding CDFs.
discrete random variables:
Random variables X and Y are independent if for any \(x_i\)and \(y_i\):
\(P(X\leq x_i,Y\leq y_j)=P(X\leq x_i)P(Y\leq y_j)\)
Define the expressions Quantile, Percentile, Median and Quartile
Percentile: quantiles expressed in percentages: The 0.2 quantile is the 20th percentile
Quartiles: are 25th and 75th percentiles
Median: is the 0.5-quantile
What is a moment?
The nth moment \(\mu_n\) of a probability density \(f_X(x)\) is defined as:
- (cont.): \(\mu_n=E(X^n)=\int x^n*f_X(x)dx\)
- (discr.): \(\mu_n=E(X^n)=\sum x^n_k * f_X(x_k)\)
The n th central moment \(\mu'_n\) of a probability density \(f_X(x)\) is defined with respect to the first moment (\(\mu\)) as
\(\mu_n'=E((X-\mu)^n)=\int (x-\mu)^n * f_X(x)dx \)
How is the expected value and the variance defined?
The expected value, also called the mean is defined as the first moment:
\(\mu=E(x)=\int x*f(x)dx \)
The expected value can be physically seen as the centroid of mass in physics.
The variance is defined as the second central moment:
\(\sigma^2=Var(x)=E((X-\mu)^2)=E(X^2)-\mu^2\)
The variance gives the spread around the expected value.
What is the fourth central moment?
Kurtosis (measure of peakness)
The kurtosis of any univariate normal distribution is 3. It is common to compare the kurtosis of a distribution to this value. Distributions with kurtosis less than 3 are said to be platykurtic, although this does not imply the distribution is "flat-topped" as sometimes reported. Rather, it means the distribution produces fewer and less extreme outliers than does the normal distribution. An example of a platykurtic distribution is the uniform distribution, which does not produce outliers.
Der Exzess gibt die Differenz der Wölbung der betrachteten Funktion zur Wölbung der Dichtefunktion einer normalverteilten Zufallsgröße an.
What is the Mode?
The mode is the value that appears most often in a set of data. For a continuous probability distribution it is the peak.
-
- 1 / 88
-