statistics for atmospheric science
statistic for atmospheric science
statistic for atmospheric science
Kartei Details
Karten | 88 |
---|---|
Sprache | English |
Kategorie | Mathematik |
Stufe | Universität |
Erstellt / Aktualisiert | 21.07.2018 / 27.08.2018 |
Weblink |
https://card2brain.ch/box/20180721_statistics_for_atmospheric_science
|
Einbinden |
<iframe src="https://card2brain.ch/box/20180721_statistics_for_atmospheric_science/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Significance test for the mean. The One sample t-test, when the variance is known
X is \(\mathcal{N(m,\sigma)}\) , \(\sigma\) is known, m is unknow.
The test statisitc is then:
\(.\\ T=\frac{\bar{x}-m_0}{\sigma/\sqrt{n}}\)
Und the Null hypothesis, T is \(\mathcal{N}(0,1)\)
hint: Due to the central limit theorem the one-sided t-test is applicable for every kind of ditributed X, if the sample size is larger then n=30!
Significance test for the mean: One sample t-test when the variance is not known
X is \(\mathcal{N}(m,\sigma)\) , \(\sigma \) is unknown
Then the test statistic is:
\(.\\T=\sqrt{n}\frac{\bar{x}-m_0}{\widehat{s}}\)
Under the Null hypothesis, T is t-distributed with (n-1) degrees of freedom.
hint: Due to the central limit theorem the one-sided t-test is applicable for every kind of ditributed X, if the sample size is larger then n=30!
Explain the Two sample t-test in general.
With the two-sample t-test one can test with the estimators of the means of two independent samples how the theoretical means are related to each other.
For this several variants are possible:
\(\sigma_1^2=\sigma _2^2\) and \(n_1=n_2\) with the Null-hypothesis \(H_0:\mu_1=\mu_2\)
The test statisitc is then:
\(.\\t=\frac{\overline{x_1}-\overline{x_2}}{\sqrt{\frac{s_1^2+s_2^2}{n}}}\)
Under the Null hypothesis t is t-distributed with (2n-2) d.o.f.
- other variants are:
Explain the Kolmogorov-Smirnov test!
\(H_0\): The empirical distribution is equal to \(F_{ref}(x)\) .
The test statistic is:
\(D_n=sup_x|F_n(x)-F_{ref}(x)|\)
Under the Null hypothesis is \(\sqrt{n}D_n\) Kolmogorov-distributed, independent of \(F_{ref}(x)\) .
Tabulated values: Rejects when \(D_{obs}>D_{crit}\)
What is a multivariate random variable and the joint probability density function?
A vector of scalar random variables \(X=(X_1,...,X_n)^T\)is called a multivariate random variable.
The joint occurrence of two continuous events x and y in a two-dimensional subset D of the event space is given by the joint probability density function \(f_{X,Y}\):
\(P(X\in D,Y\in D)=\int\limits_D f_{X,Y}(x,y)dxdy\)
What is the joint probability distribution and how is it linked to the joint cummulative distribution ?
\(F_{X,Y}(x,y)=P(X\leq x, Y\leq y)\)describes the probability that simultaneously \(X\leq x\) and \(Y\leq y\) occur.
Joint CDF and PDF are linked by:
\(f_{X,Y}(x,y)=\frac{\partial^2 F_{X,Y}}{\partial x \partial y}\)
How are the marginal distributions of a joint distribution defined?
When integrating the joint density function over all other variables, one obtain the marginal density functions associated with a specific variable:
\(f_X(x)=\int f_{X,Y}(x,y)dy\\f_Y(y)=\int f_{X,Y}(x,y)dx\)
This describes the occurrence of a variable regardless of which values the other variables assume.
Equivalent to conditional probabilities, the conditional density functions are defined as:
\(f_{Y|X}(y|x)=f_{X,Y}(x,y)/f_X(x)\\f_{X|Y}(x|y)=f_{X,Y}(x,y)/f_Y(y)\)
and describe the occurrence of y given x and vice versa.
When are components of a multivariate random variable independent?
\(F_{X,Y}(x,y)=F_X(x)F_Y(y)\)
or
\(f_{X,Y}(x,y)=f_X(x)f_Y(y)\)
What is covariance and what is correlation?
Covariance:
\(Cov(X,Y)=E[(X-\mu_X)(Y-\mu_Y)]=E(XY)-E(X)E(Y)\)
\(E(XY)=\int xy*f_{X,Y}(x,y)dxdy\)
In probability theory and statistics, covariance is a measure of the joint variability of two random variables.[1] If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, (i.e., the variables tend to show similar behavior), the covariance is positive.
Correlation:
\(Corr(X,Y)=\frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}\)
The correlation describes the strength of the linear relationship between X and Y. -1 for perfect anti-correlation and 1 for perfect correlation.
Figure below shows non-linear relation between two variables, which correlation is zero!
Name widely used correlations products in climate physics.
- point to point correlation map
- represents the correlation between two fields at identical spatial location
- box correlation
- shows the correlation of a defined box with the rest of the field
- the global correlation field is often called the teleconnections
Give the formulas of the conditional expected value for a two dimensional joint probability density function!
\(E(X|Y=y)=\int\limits_{-\infty}^{\infty}x\;f_{X|Y}(x|y)dx\\E(Y|X=x)=\int\limits^{\infty}_{-\infty}y\;f_{Y|X}(y|x)dy\)
Explain the relation of covariance and independence!
If \(X_1,...,X_N\) are independent then \(Cov(X_i,X_j)=0\)for \(i\neq j\).
But if the covariance is zero this doesn't mean necessarily that the variables are also independent!
Give another expression for \(Var(X+Y)\)for the dependent and independent case!
dependent case:
\(Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\)
for the independent case:
\(Var(X+Y)=Var(X)+Var(Y)\)
Regression models: Introduce into the linear model!
- Given the pairs of data \((x_1,y_1),...,(x_n,y_n)\) , a linear model is defined as \(y_i=\beta_0+\beta_1x_i+\eta_i\).
- x is called the independent variable or predictor, while y is the dependent variable, response, or predictand.
- The model parameters may be defined, such that the sum of the squared error SSE gets minimized.
- \(SSE=\sum_{i=1}^{n}(y_i-\beta_0-\beta_1x_i)^2=\sum^n_{i=1}(y_i-\widehat{y}_i)^2\equiv min\)
- The parameters \(\widehat{\beta}_0\) and \(\widehat{\beta}_1\)can be estimated from:
- \(\;\\\widehat{\beta}_1=\frac{\sum^n_{i=1}(x-\bar{x})(y_i-\bar{y})}{\sum^n_{i=1}(x_i-\bar{x})^2},\;\;\;\widehat{\beta}_0=\bar{y}-\beta_1\bar{x}\)
- Any pattern in the reisudals indicates that the used regression model is too simple.
- If the error increases with increasing x, a weighted regression can be used!
What is a stationary process?
A stochastic process \(X_t:t\epsilon Z\) is said to be stationary if all stochastic properties (mean, variance, correlation, ...) are independent of the index t, which can be an index of time or a spatial dimension.
It follows then:
- \(X_t\) has the same distribution function F for all t
- for all t and s the paramters of the joint distribution function of \(X_t\) and \(X_s\) depend only on \(|t-s|\)
The process is called stationary up to the order m, when the same considerations apply to the mth joint moment of the process (i.e. the mth joint moment of \(\{X(t_1),...,X(t_n)\}\)) is equal to the mth joint moment of \(\{X(t_1+k),...,X(t_n+k)\}\) for any k and any set \((t_1,...,t_n)\) .
Of what types of processes can you think?
- Gausian (normal) process
- Markov process
- Prurely random process
Processes: What is a Realization? What is an Ensemble?
Realization: One observed record of a random process.
Ensemble: The collection of all possible realizations.
What is an Auto-regressive process?
An auto-regressive process of order p, or an AR(p) process, is generally defined as follows: \(X_t:t\;\epsilon\;Z \) is an auto-regressive process p of order p if there exist real constants \(\alpha_k,k=0,1,...,p\) , with \(\alpha_k\neq 0\) and a white noise process \(Z_t:t\;\epsilon\;Z\) such that:
\(X_t=\alpha_0+\sum\limits^p_{k=1}\alpha_kX_{t-k}+Z_t\)
Give the formula for the AR(1) process!
The AR(1) process is a (linear, first order) Markov process, i.e. the current state \(X_t\) depends only on the last state \(X_{t-1}\) . This process can be written as:
\(X_t=aX_{t-1}+\epsilon_t\)
What is the AR(2) process?
This kind of process is described by:
\(X_t=a_2X_{t-2}+a_1X_{t-1}+\epsilon_t\)
What is the spectrum? (in words)
The spectra of a time series is the Fourier analysis of the time series, and it is the Fourier transform of the auto-covariance function of the time series (Wiener-Khinchin theorem)!
It presents the variance per frequency of the time series as a function of frequencies and therefore distributes the variance onto different frequencies.
Explanation: Energy spectral density:
\(E=\int\limits_{-\infty}^{\infty}|x(t)|^2 dt\)
in the case of a pulse-like signal with finite total energy we find:
\(E=\int\limits_{-\infty}^{\infty}|\widehat{x}(f)|^2 df\)
where
\(|\widehat{x}(f)|^2\) is the energy spectral density of a signal x(t).
------------------
For continous signals over all time, such as stationary processes, one must define the power spectral density.
The average power P of a signal x(t) over all time is given as:
\(P=lim_{T\rightarrow \infty}\frac{1}{T}\int\limits^T_{0}|x(t)^2|dt\)
Stationary processes may have a finite power but an infinite energy. After all, energy is the integral of power, and the stationary signal contious over an infinite time.
For a signal with infinite duration the ordinary Fourier Transform does not necessarily exist. For that we use the truncated Fourier Transform.
\(\widehat{x}(\omega)=\frac{1}{\sqrt{T}}\int\limits^{T}_0 x(t)e^{-i\omega t}dt\)
Then the power spectral density is:
\(S_{xx}(\omega)=lim_{T\rightarrow \infty}E[|\widehat{x}(\omega)|^2]\)
Show the transition from a Fourier Series to a Fourier Transform!
The complex notation of the Fourier Series of a function periodic in the intervall \([-L/2,L/2]\)is:
\(x(t)=\sum\limits_{n=-\infty }^{\infty}A_ne^{i(2\pi nt/L)}\\ A_n=\frac{1}{L}\int\limits^{L/2}_{-L/2}x(t)e^{-i(2\pi nt/L)}\)
If \(L\rightarrow \infty\)than \(n/L\rightarrow \vartheta\) since \(\omega=2\pi\vartheta\)
\(X(\vartheta)=F(x(t))=\int^\infty_{-\infty}x(t)e^{-i(2\pi\vartheta)t}dt\)
Wiener-Khinchin theorem: What is it?
The power spectrum density is the Fourier transform of the auto-covariance function of the time series:
\(S(\omega)=E[X^*(\omega)X(\omega)]=\int\limits^{\infty}_{-\infty}R(\tau)e^{-i\omega \tau}d\tau\)
Parsevals theorem: What states it?
\(\int\limits^\infty_{-\infty}x(t)^2dt=\int\limits^{\infty}_{-\infty}X(\omega)^2d\omega\)
- the Fourier transform preserves the energy of the original quantity.
- For a pulse-like signal the integration of power in the time space is equal to the integration of power in the frequency space!
Discrete Fourier transform: Give deifintions!
Given a time discrete time series \(x_i\), \(i=1...N\) , sampled with sampling time \(\Delta t\)
Discrete Fourier transform:
\(X(\omega_l)=\frac{1}{\sqrt{N}}\sum^N_{k=1}x_ke^{-i\omega_lk\Delta t},\;\;\;\omega_l=\frac{2\pi l}{\Delta tN},\;\;\;l=1...N/2\)
We assume that the parameters characterizing the spectral components (amplitudes, frequencie, phases) do not change with time (signal is stationary).
Fourier spectral analysis is particularly useful for stationary random processes because they do not have systematic trends that violate the periodicity assumptions inherent in a DFT.
Nyquist frequency: What is it?
The highest resolved frequency by a discrete Fourier transform. Depends from the sampling rate \(\Delta t\) .
\(\omega _{max}=\omega_{N/2}=\frac{\pi}{\Delta t},\;\;\;f_{max}=\frac{1}{2\Delta t}\)
Frequency resolution of the discrete Fourier transform: Give the expression!
\(\Delta f=\frac{1}{\Delta tN}\)
The frequency resolution (and thus also the lowest resolved frequency) depends on the time series length.
Periodogram: Definitions!
Given a time discrete time series \(x_i\) with \(i=1...N\) and \(t_i=i\Delta t\) .
A first estimator of the spectrum is the periodogram:
\(\widehat{S}(\omega_l)=P(\omega_l)=|X(\omega_l)|^2\)
- The highest resolved frequency (Nyquist frequency) depends on the sampling time/rate.
- The frequency resolution (and thus also the lowest resolved frequency) depends on the length of the time window.
What are the problems of the Periodogram?
- The periodogram is not a consistent estimator. For large N, the variance of the periodogram does not decrease.
- Leakage. Because any observed time series is of finite length, spectral peaks get smeared out.
- Aliasing. High frequencies above the Nyquist frequency are not sampled, but appear as artefacts in lower frequencies.
Methods of improving the estimation of the power spectral density via periodogram!
- Repeat the experiment and average the individual spectra.
- Cut the time series into subseries and average the sub periodograms. This reduces the frequency resolution!
- Smooth the periodogram with a suitable smoothing kernel ("window, taper"). Reduction of variance but increase of bias!
- Fit AR(MA) models to time series and calculate their spectra.
Windowing and Tapering: What is improved? What are the drawbacks?
The PDF of the power spectral estimator is \(\chi ^2_2\)-distributed with whose relative standard deviation is 1. This means that the frequency resolution increases, but the variance stays constant with increasing N.
Although the variance can be decreased (less random scatter) with windowing and tapering.
For a stationary time series, the periodogram of each window of data gives an independent unbiased estimate of the power spectrum and thus can be averaged to smooth the spectrum.
benefit and drawbacks:
The shorter the block length, the more blocks, the smoother spectrum but also lower frequency resolution!
The idea of windowing is best shown with a rectangular window but this is in practice not often used since two problems:
- endpoint discontinuities of the window alter low frequency-variability.
- every finite time series comes with leakage.
Both problems can be reduced with weight functions for the window which goes smoothly to zero at the endpoints.
It makes sense to overlap this windows so that all data is near to the center at some point and is accounted equally. Allthough, then the power spectrum estimates are not independent anymore and this must be considered in the uncertainty estimates of the power spectrum too!
Explain Leakage!
sea Karls Skript!
How to cope with Aliasing?
If the original time series has some power on an alias-frequency of a frequency f (i.e. the alias- frequency is higher than the Nyquist frequency), this power will appear additionally at the frequency f in the spectrum of the sample.
The only possibility to avoid this, is to filter out the high frequencies (i.e. to low pass filter the signal) before the sample is taken.
Sound and Color Spectra
- The name arises of the appearance of visible light with this spectra distribution
- Each "tone color" discussed hear follows a power law in the form: \(S(\omega)\sim f^\alpha\)
White noise:
- The spectrum of Gaussian white noise \(x_t\sim \mathcal{N}(0,\sigma^2)\) is given as \(S(\omega)=\frac{\sigma^2}{2\pi}\)
- Spectrum of white noise is constant so \(\alpha=0\)
pink noise:
- Pink noise is linear in the logarithmic scale
- \(S(\omega)\sim f^{-1}\)
red noise:
- largest variance at smallest frequencies
- \(S(\omega)\sim f^{-2}\)
blue noise:
- linear in logarithmic scale with more energy in higher frequencies:
- \(S(\omega)\sim f^1\)
violet noise:
- has more energy in higher frequencies and scales as
- \(S(\omega)\sim f^2\)
grey noise:
- grey noise contains all frequencies with equal loudness
- (white noise shows equal energy for all frequencies)