Karten 88 Karten
Lernende 1 Lernende
Sprache English
Stufe Universität
Erstellt / Aktualisiert 21.07.2018 / 27.08.2018
Lizenzierung Keine Angabe
0 Exakte Antworten 88 Text Antworten 0 Multiple Choice Antworten
Fenster schliessen

What is a stochastic process? 

Let \(T\) be a subset of \([0,\infty)\). A family of random variables \(\{X_t\}_{t\in T}\), indexed by \(T\), is called a stochastic process. 

Fenster schliessen

EOF/Principal Component Analysis.


Target: Decreasing dimensionality of data without loosing important information. 

Main Idea: To peform a linear transformation E on a data matrix X to a new data matrix A in such a way that a lot of information about variability in X gets compressed into fewer dimensions in A. The number of dimension is the same but what changed is that a few dimension carry most of the variability. So we just can skip the other dimensions. 

\(A_{N\times M}=X_{M\times N}^TE_{M\times M}\)

A are the principal components (PC's); The number of PC's equals the number of stations (space) 

X is the data matrix 

E eigenvector matrix (contains the empirical orthogonal functions EOF's) 


  • The Covariance between two PCA's is always zero -> the EOF's stand orthogonal to each other. 
  • Each EOF comes with an eigenvalue which is a measure for the explained variance.
  • The explained variance of one EOF/PC is: \(\frac{\lambda_i}{diag(\lambda)}\)
  • The original data can be written as the sum of the product of the EOF's and the corresponding PCs:



There are several techniques to find such a transformation matrix (eigenvector matrix); We discuessed two:

  1. Eigendecomposition of covariance matrix of the data
    • is more intuitive
  2. Singular Value Decomposition (SVD)
    • is more computationally effective
Fenster schliessen

On what assumptions is the PCA/EOF analysis based?

following assumptions about the data matrix X:

  1. is multivariate normal distributed
  2. is not auto-correlated (seasonailty needs to be removed)
  3. variability is linear (The variability in the data can be expressed as a sum of the single EOFs)
  4. no noise
Fenster schliessen

Eigendecomposition of covariance matrix

Let be \(X_{M\times N}\)the [space x time] data matrix

This method is based on the eigenvalue problem of the (MxM) covariance matrix:

\(\sum=Cov(X,X)=XX^T\) where X is the anomaly matrix (i.e. the mean of each time series is 0). Furthermore the data is detrended an deseasonalized. 

The Eigenvalue problem is given by \(\sum E=\lambda E\)

\(\sum ...Covariance\;matrix\\ E...eigenvector\;matrix\;(each\;column\;represents\;an\;M\times1\;EOF)\\ \lambda...is\;the\;M\times M\;eigenvalue\;matrix\)

In general, the covariance matrix has no zero entries. On the other hand the eigenvalue matrix has only non-zero entries at his diagonal. This means with help of the EOF'S (the transformation matrix), variability got redristibuted to fewer dimensions, that way, that the Covariance of it's PC's are zero -> The EOF's are orthogonal to each other! 





Fenster schliessen

What Truncation Criterias for the relevant EOFs do you know?

  1. By explained variance
    • Only take the first K EOFs that fulfill \(\sum^K_{i=1}\lambda^2_i\geq\lambda^2_{crit}\) with \(\lambda^2_{crit}\) between 70% and 90%
  2. By slope in eigenvalue plot
    • Find the point, that seperates steep and shallow slope and take the EOFs until this point
  3. By \(log(\lambda)\) plot
    •  Look out for the point from which the eigenvalues decay exponentially, visible as a straight line in log plot (indicates uncorrelated noise)
  4. By Kaiser's rule
    • Retain \(\lambda_m\) if \(\lambda _m>T\frac{1}{M}\sum^{M}_{i=1}\lambda_i\), suggested value for T is 0.7
  5. By North's rule of thumb
    • If the distance between two eigenvalues is smaller than two estimated standard errors i.e. \(\Delta \lambda<2\lambda\sqrt{\frac{2}{n}}\), the corresponding EOF's are considered to be not well seperated from each other (the true eigenvector could be a mixture of both)
Fenster schliessen

How could you test data for normality as a preperation for PCA?

  1. Kolmogorov-Smirnov test: Compare the empirical CDF to the CDF of a normal distribution. 
  2. Lilliefors test: Compare the empirical CDF to the CDF of a normal distribution and estimate the parameters by Monte Carlo simulations. 
  3. Jarque-Bera test: Cheack wether the sample data has a skewness and kurtosis (3th and 4th central moment) comparable to that of a normal distribution. 
Fenster schliessen

Why do we have to think about PCA critical?

  1. EOF pattern do not necessarily correspond to physical modes. That's also the reason why we do often not discuss higher modes than the second. 
  2. PCs and EOFs calculated from observed data are only estimations of the true PCs and EOFs associated with the true random vector X. Thats why the orthogonal functions are called empirical -> EOF!
Fenster schliessen




Main Idea: Often measueres such as the standard error and confidence limits are not avaiable for small samples. Bootstrapping resembles the sample and computes the estimate for each sample. By taking many resamples we get a spread of the resampled estimate. 

Generally: Drawing random samples (choosing the elemnts randomly) from a population can be done with replacement and without replacement. If we take a small sample from a large distribution, it does not matter whether the element is replaced or not. 

Empirical distribution 

The empirical distribution is the distribution of the data sample, which may or may not reflect the true distribution of the population. 


To resample is to take a sample from the empirical distribution with replacement. 

Empirical bootstrap

For a sample \(x_1,...,x_n\) drawn from a distribution F of the population, the empirical bootstrap sample is the resampled data set of the same size \(x_1^*,...,x_m^*\) drwan from the empirical distribution \(F^*\)of the sample. 


Similary we can compute any statistics \(\Theta\) from the original sample also from the empirical bootstrap sample and call it \(\Theta^*\).

The bootstrap principle states that \(F^*\simeq F\), thus the variation of \(\Theta\) is well approximated by the variation of \(\Theta^*\).


-> We can approximate the variation of \(\Theta\) by the variation of \(\Theta^*\), e.g. to estimate the confidence interval of \(\Theta\).