Axioms of probability (Axioms of Kolmogorov)
Probability P : \(\Omega\;\rightarrow\;\ \mathbb{R} \) (the probability p is a transformation from the event space to the real numbers)
Given events A in an event space \(\Omega\), i.e., \(A\subset \Omega\) (A is a subset of Omega; Omega is a superset of A)
consequences of the Axioms of Kolmogorov
Independent events
Two events are independent when the following is valid:
\(P(A\cap B)=P(A)*P(B)\)
Conditional probability of two events
The conditional probability of an event A, given an event B is:
\(P(A|B)=P(A\cap B)/P(B)\)
if A and B are independent than:
\(P(A|B)=P(A)\)
Bayes' theorem
\(.\\P(A_j|B)=\frac{P(B|A_j)P(A_j)}{P(B)}\)
what types of random variables do exist?
Cumulative distribution function (CDF)
\(F_X(x)=P(X\leq x)\) continuous random variables
\(F_X(x)=\sum_{x_i< x}P(X=x_i)\) discrete random variables
Probability distribution function
Probability mass function (only for discrete variables!):
\(f_X(x)=P(X=x)\)
Probability density function (PDF, for continous random variables!):
\(f_X(x)=\frac{dF_X(x)}{dx}\)
proberties:
Independent random variables
continuous random variables:
Random variables X and Y are independent if for any x and y:
\(P(X\leq x, Y\leq y)=P(X\leq x)P(Y\leq y)=F(x)G(y)\)
where F(x) and G(x) are the corresponding CDFs.
discrete random variables:
Random variables X and Y are independent if for any \(x_i\)and \(y_i\):
\(P(X\leq x_i,Y\leq y_j)=P(X\leq x_i)P(Y\leq y_j)\)
Define the expressions Quantile, Percentile, Median and Quartile
Percentile: quantiles expressed in percentages: The 0.2 quantile is the 20th percentile
Quartiles: are 25th and 75th percentiles
Median: is the 0.5-quantile
What is a moment?
The nth moment \(\mu_n\) of a probability density \(f_X(x)\) is defined as:
The n th central moment \(\mu'_n\) of a probability density \(f_X(x)\) is defined with respect to the first moment (\(\mu\)) as
\(\mu_n'=E((X-\mu)^n)=\int (x-\mu)^n * f_X(x)dx \)
How is the expected value and the variance defined?
The expected value, also called the mean is defined as the first moment:
\(\mu=E(x)=\int x*f(x)dx \)
The expected value can be physically seen as the centroid of mass in physics.
The variance is defined as the second central moment:
\(\sigma^2=Var(x)=E((X-\mu)^2)=E(X^2)-\mu^2\)
The variance gives the spread around the expected value.
Explain Skewness!
What is the fourth central moment?
Kurtosis (measure of peakness)
The kurtosis of any univariate normal distribution is 3. It is common to compare the kurtosis of a distribution to this value. Distributions with kurtosis less than 3 are said to be platykurtic, although this does not imply the distribution is "flat-topped" as sometimes reported. Rather, it means the distribution produces fewer and less extreme outliers than does the normal distribution. An example of a platykurtic distribution is the uniform distribution, which does not produce outliers.
Der Exzess gibt die Differenz der Wölbung der betrachteten Funktion zur Wölbung der Dichtefunktion einer normalverteilten Zufallsgröße an.
What is the Mode?
The mode is the value that appears most often in a set of data. For a continuous probability distribution it is the peak.
What are the probability density and the cumulative distribution function of the uniform distribution?
Normal (Gaussian) distribution
What is intermittency?
A signal is said to be intermittent if rare events of large magnitude are separated by long periods with events of low magnitude. Spatial intermittency implies that the signal displays localized regions with events of large magnitude, and wide areas with events of low magnitude.
Can the variance be zero?
Yes, then:
Tell me a distribution where no moments exist:
The Cauchy distribution
Law of large numbers
Given a sequence of random variables \(X_1,X_2,...\) with mean \(\mu\) . Then it holds:
\(lim_{n\rightarrow \infty}\frac{1}{n}\sum^{n}_{i=1}X_i\rightarrow \mu\)
Central limit theorem
Given a sequence of independent and identical distributed random variables \(X_1,...,X_n\) with expected value \(\mu\) and variance \(\sigma^2\) , then the distribution of \(S_n=\frac{1}{n}(X_1+...+X_n)\)is approximately normal with mean \(\mu\) and variance \(\frac{1}{n}\sigma^2\) or,
\(\sqrt{n}\left(\frac{1}{n}\sum\limits^n_{i=1}X_i-\mu \right)\xrightarrow d\mathcal{N}(0,\sigma)\)
the \(\xrightarrow d\) reads "converges in distribution to".
How large does n to be chosen? Depend on the underlying distributions of the sample sequences.
Chebychev's inequality
For any random variable and c>0 there holds:
\(P(|X-E(X)|\geq c)\leq\frac{Var(X)}{c^2}\)
Empirical vs. theoretical quantities
Quantities estimated from a given sample are often referred to as empirical or sample quantities. \(\hat{\;\mu} \)
The corresponding true or model quantities are often referred to as the theoretical. \(\mu\)
Given a sample \(x_1,...,x_N\)of a random variable X. Consider a parameter \(\Theta\) on X, e.g., the mean \(\mu\) .
Then the estimator \(\hat{\;\Theta}\) is a function of a sample (i.e., is a statistics) of the random variable X which assigns to the sample values which distribution depend on (and are close to) \(\Theta\) .
Estimators: What are the formulas for the sample mean, sample variance (known mean and not known mean) and sample standard deviation?
sample mean: \(\bar{x}=\widehat{\mu}=\frac{1}{N}\sum\limits^{N}_{i=1}x_i\)
sample variance: \(\widehat{Var}(x)=\frac{1}{N-1}\sum\limits^{N}_{i=1}(x_i-\bar{x})^2 \)
sample variance with known \(\mu\): \(\widehat{Var}(x)=\frac{1}{N}\sum\limits^{N}_{i=1}(x_i-\bar{x})^2 \)
standard deviation: \(\widehat{s}=\sqrt{\widehat{Var}(x)}\)
Given independent random variables X and Y with expected values \(\mu_X\) and \(\mu_Y\) and variances \(\sigma^2_X\) and \(\sigma^2_Y\).
How to calculate expected value and variance if,
\(\mu_Z=\mu_X*\mu_Y\)
Note the density function and the cumulative distribution of composed random variables (as X+Y or XY) is in general not easy to determine, although mean and variance can easily be determined.
What is the estimator of the probability density function?
The histogram, which contains the relative occurrence divided by the bin width.
Choice of the number of bins K for a histogram
non-trivial:
When is an estimator consistent?
The estimator \(\widehat{\Theta}\), as a function of the random variable X, is again a random variable. Therefore every estimator has an expected value and variance.
An estimator is called consistent if:
\(P(|\widehat{\Theta}-\Theta|>\epsilon)\rightarrow0\;\;for\;\;N\rightarrow\infty\)
for all \(\epsilon >0\)
Example: The estimator for the expected value \(\widehat{\Theta}=\widehat{\mu}\) (the sample mean) is consistent (law of the large numbers).
\(\widehat{\mu}=\frac{1}{N}\sum\limits^N_{i=1}x_i\)
What is the Mean Squared Error (MSE) and Variance of an estimator?
\(MSE(\widehat{\Theta})=E[(\widehat{\Theta}-\Theta)^2]\)
The MSE is also called as risk
\(Var(\widehat{\Theta})=E[(\widehat{\Theta}-E(\widehat{\Theta}))^2]\)