1/88
Periodogram: Definitions!
Given a time discrete time series \(x_i\) with \(i=1...N\) and \(t_i=i\Delta t\) .
A first estimator of the spectrum is the periodogram:
\(\widehat{S}(\omega_l)=P(\omega_l)=|X(\omega_l)|^2\)
What are the problems of the Periodogram?
Methods of improving the estimation of the power spectral density via periodogram!
Windowing and Tapering: What is improved? What are the drawbacks?
The PDF of the power spectral estimator is \(\chi ^2_2\)-distributed with whose relative standard deviation is 1. This means that the frequency resolution increases, but the variance stays constant with increasing N.
Although the variance can be decreased (less random scatter) with windowing and tapering.
For a stationary time series, the periodogram of each window of data gives an independent unbiased estimate of the power spectrum and thus can be averaged to smooth the spectrum.
benefit and drawbacks:
The shorter the block length, the more blocks, the smoother spectrum but also lower frequency resolution!
The idea of windowing is best shown with a rectangular window but this is in practice not often used since two problems:
Both problems can be reduced with weight functions for the window which goes smoothly to zero at the endpoints.
It makes sense to overlap this windows so that all data is near to the center at some point and is accounted equally. Allthough, then the power spectrum estimates are not independent anymore and this must be considered in the uncertainty estimates of the power spectrum too!
Explain Leakage!
sea Karls Skript!
How to cope with Aliasing?
If the original time series has some power on an alias-frequency of a frequency f (i.e. the alias- frequency is higher than the Nyquist frequency), this power will appear additionally at the frequency f in the spectrum of the sample.
The only possibility to avoid this, is to filter out the high frequencies (i.e. to low pass filter the signal) before the sample is taken.
Sound and Color Spectra
White noise:
pink noise:
red noise:
blue noise:
violet noise:
grey noise:
What is a stochastic process?
Let \(T\) be a subset of \([0,\infty)\). A family of random variables \(\{X_t\}_{t\in T}\), indexed by \(T\), is called a stochastic process.
EOF/Principal Component Analysis.
Target: Decreasing dimensionality of data without loosing important information.
Main Idea: To peform a linear transformation E on a data matrix X to a new data matrix A in such a way that a lot of information about variability in X gets compressed into fewer dimensions in A. The number of dimension is the same but what changed is that a few dimension carry most of the variability. So we just can skip the other dimensions.
\(A_{N\times M}=X_{M\times N}^TE_{M\times M}\)
A are the principal components (PC's); The number of PC's equals the number of stations (space)
X is the data matrix
E eigenvector matrix (contains the empirical orthogonal functions EOF's)
--------------
\(X^T=\sum^M_{i=1}a_iE_i=AE^T\)
There are several techniques to find such a transformation matrix (eigenvector matrix); We discuessed two:
On what assumptions is the PCA/EOF analysis based?
following assumptions about the data matrix X:
Eigendecomposition of covariance matrix
Let be \(X_{M\times N}\)the [space x time] data matrix
This method is based on the eigenvalue problem of the (MxM) covariance matrix:
\(\sum=Cov(X,X)=XX^T\) where X is the anomaly matrix (i.e. the mean of each time series is 0). Furthermore the data is detrended an deseasonalized.
The Eigenvalue problem is given by \(\sum E=\lambda E\)
\(\sum ...Covariance\;matrix\\ E...eigenvector\;matrix\;(each\;column\;represents\;an\;M\times1\;EOF)\\ \lambda...is\;the\;M\times M\;eigenvalue\;matrix\)
In general, the covariance matrix has no zero entries. On the other hand the eigenvalue matrix has only non-zero entries at his diagonal. This means with help of the EOF'S (the transformation matrix), variability got redristibuted to fewer dimensions, that way, that the Covariance of it's PC's are zero -> The EOF's are orthogonal to each other!
------
What Truncation Criterias for the relevant EOFs do you know?
How could you test data for normality as a preperation for PCA?
Why do we have to think about PCA critical?
Bootstrapping
Bootstrapping
Main Idea: Often measueres such as the standard error and confidence limits are not avaiable for small samples. Bootstrapping resembles the sample and computes the estimate for each sample. By taking many resamples we get a spread of the resampled estimate.
Generally: Drawing random samples (choosing the elemnts randomly) from a population can be done with replacement and without replacement. If we take a small sample from a large distribution, it does not matter whether the element is replaced or not.
Empirical distribution
The empirical distribution is the distribution of the data sample, which may or may not reflect the true distribution of the population.
Resampling
To resample is to take a sample from the empirical distribution with replacement.
Empirical bootstrap
For a sample \(x_1,...,x_n\) drawn from a distribution F of the population, the empirical bootstrap sample is the resampled data set of the same size \(x_1^*,...,x_m^*\) drwan from the empirical distribution \(F^*\)of the sample.
Similary we can compute any statistics \(\Theta\) from the original sample also from the empirical bootstrap sample and call it \(\Theta^*\).
The bootstrap principle states that \(F^*\simeq F\), thus the variation of \(\Theta\) is well approximated by the variation of \(\Theta^*\).
-> We can approximate the variation of \(\Theta\) by the variation of \(\Theta^*\), e.g. to estimate the confidence interval of \(\Theta\).
Axioms of probability (Axioms of Kolmogorov)
Probability P : \(\Omega\;\rightarrow\;\ \mathbb{R} \) (the probability p is a transformation from the event space to the real numbers)
Given events A in an event space \(\Omega\), i.e., \(A\subset \Omega\) (A is a subset of Omega; Omega is a superset of A)
consequences of the Axioms of Kolmogorov
Independent events
Two events are independent when the following is valid:
\(P(A\cap B)=P(A)*P(B)\)
Conditional probability of two events
The conditional probability of an event A, given an event B is:
\(P(A|B)=P(A\cap B)/P(B)\)
if A and B are independent than:
\(P(A|B)=P(A)\)
Bayes' theorem
\(.\\P(A_j|B)=\frac{P(B|A_j)P(A_j)}{P(B)}\)
what types of random variables do exist?
Cumulative distribution function (CDF)
\(F_X(x)=P(X\leq x)\) continuous random variables
\(F_X(x)=\sum_{x_i< x}P(X=x_i)\) discrete random variables
Probability distribution function
Probability mass function (only for discrete variables!):
\(f_X(x)=P(X=x)\)
Probability density function (PDF, for continous random variables!):
\(f_X(x)=\frac{dF_X(x)}{dx}\)
proberties:
Independent random variables
continuous random variables:
Random variables X and Y are independent if for any x and y:
\(P(X\leq x, Y\leq y)=P(X\leq x)P(Y\leq y)=F(x)G(y)\)
where F(x) and G(x) are the corresponding CDFs.
discrete random variables:
Random variables X and Y are independent if for any \(x_i\)and \(y_i\):
\(P(X\leq x_i,Y\leq y_j)=P(X\leq x_i)P(Y\leq y_j)\)
Define the expressions Quantile, Percentile, Median and Quartile
Percentile: quantiles expressed in percentages: The 0.2 quantile is the 20th percentile
Quartiles: are 25th and 75th percentiles
Median: is the 0.5-quantile
What is a moment?
The nth moment \(\mu_n\) of a probability density \(f_X(x)\) is defined as:
The n th central moment \(\mu'_n\) of a probability density \(f_X(x)\) is defined with respect to the first moment (\(\mu\)) as
\(\mu_n'=E((X-\mu)^n)=\int (x-\mu)^n * f_X(x)dx \)
How is the expected value and the variance defined?
The expected value, also called the mean is defined as the first moment:
\(\mu=E(x)=\int x*f(x)dx \)
The expected value can be physically seen as the centroid of mass in physics.
The variance is defined as the second central moment:
\(\sigma^2=Var(x)=E((X-\mu)^2)=E(X^2)-\mu^2\)
The variance gives the spread around the expected value.
What is the fourth central moment?
Kurtosis (measure of peakness)
The kurtosis of any univariate normal distribution is 3. It is common to compare the kurtosis of a distribution to this value. Distributions with kurtosis less than 3 are said to be platykurtic, although this does not imply the distribution is "flat-topped" as sometimes reported. Rather, it means the distribution produces fewer and less extreme outliers than does the normal distribution. An example of a platykurtic distribution is the uniform distribution, which does not produce outliers.
Der Exzess gibt die Differenz der Wölbung der betrachteten Funktion zur Wölbung der Dichtefunktion einer normalverteilten Zufallsgröße an.
What is the Mode?
The mode is the value that appears most often in a set of data. For a continuous probability distribution it is the peak.