What is a stochastic process?
Let \(T\) be a subset of \([0,\infty)\). A family of random variables \(\{X_t\}_{t\in T}\), indexed by \(T\), is called a stochastic process.
EOF/Principal Component Analysis.
Target: Decreasing dimensionality of data without loosing important information.
Main Idea: To peform a linear transformation E on a data matrix X to a new data matrix A in such a way that a lot of information about variability in X gets compressed into fewer dimensions in A. The number of dimension is the same but what changed is that a few dimension carry most of the variability. So we just can skip the other dimensions.
\(A_{N\times M}=X_{M\times N}^TE_{M\times M}\)
A are the principal components (PC's); The number of PC's equals the number of stations (space)
X is the data matrix
E eigenvector matrix (contains the empirical orthogonal functions EOF's)
--------------
\(X^T=\sum^M_{i=1}a_iE_i=AE^T\)
There are several techniques to find such a transformation matrix (eigenvector matrix); We discuessed two:
On what assumptions is the PCA/EOF analysis based?
following assumptions about the data matrix X:
Eigendecomposition of covariance matrix
Let be \(X_{M\times N}\)the [space x time] data matrix
This method is based on the eigenvalue problem of the (MxM) covariance matrix:
\(\sum=Cov(X,X)=XX^T\) where X is the anomaly matrix (i.e. the mean of each time series is 0). Furthermore the data is detrended an deseasonalized.
The Eigenvalue problem is given by \(\sum E=\lambda E\)
\(\sum ...Covariance\;matrix\\ E...eigenvector\;matrix\;(each\;column\;represents\;an\;M\times1\;EOF)\\ \lambda...is\;the\;M\times M\;eigenvalue\;matrix\)
In general, the covariance matrix has no zero entries. On the other hand the eigenvalue matrix has only non-zero entries at his diagonal. This means with help of the EOF'S (the transformation matrix), variability got redristibuted to fewer dimensions, that way, that the Covariance of it's PC's are zero -> The EOF's are orthogonal to each other!
------
What Truncation Criterias for the relevant EOFs do you know?
How could you test data for normality as a preperation for PCA?
Why do we have to think about PCA critical?
Bootstrapping
Bootstrapping
Main Idea: Often measueres such as the standard error and confidence limits are not avaiable for small samples. Bootstrapping resembles the sample and computes the estimate for each sample. By taking many resamples we get a spread of the resampled estimate.
Generally: Drawing random samples (choosing the elemnts randomly) from a population can be done with replacement and without replacement. If we take a small sample from a large distribution, it does not matter whether the element is replaced or not.
Empirical distribution
The empirical distribution is the distribution of the data sample, which may or may not reflect the true distribution of the population.
Resampling
To resample is to take a sample from the empirical distribution with replacement.
Empirical bootstrap
For a sample \(x_1,...,x_n\) drawn from a distribution F of the population, the empirical bootstrap sample is the resampled data set of the same size \(x_1^*,...,x_m^*\) drwan from the empirical distribution \(F^*\)of the sample.
Similary we can compute any statistics \(\Theta\) from the original sample also from the empirical bootstrap sample and call it \(\Theta^*\).
The bootstrap principle states that \(F^*\simeq F\), thus the variation of \(\Theta\) is well approximated by the variation of \(\Theta^*\).
-> We can approximate the variation of \(\Theta\) by the variation of \(\Theta^*\), e.g. to estimate the confidence interval of \(\Theta\).