QM
HS 17/18
HS 17/18
Kartei Details
Karten | 82 |
---|---|
Sprache | English |
Kategorie | Finanzen |
Stufe | Universität |
Erstellt / Aktualisiert | 04.01.2018 / 04.01.2018 |
Weblink |
https://card2brain.ch/box/20180104_qm
|
Einbinden |
<iframe src="https://card2brain.ch/box/20180104_qm/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Probability Function
The probability function for a discrete random variable X is a function p(x) that assigns a probability to each value of the random variable. It must be greater than or equal to zero and the
sum of all individual outcome probabilities is equal to 1.
Probability Density Function
The probability density function corresponds to the probability function, but in the continuous case. It is defined by the derivative of the Cumulative Distribution Function (CDF). In the continous case, the probability of a single realization is always zero, and we need to express the probability
by taking the integral in an interval under the curve
Cumulative Distribution Function
The cumulative distribution function (CDF) F(x) = P(X≤xi) indicates the probability that X takes maximally a value of x. It is called cumulative since it’s the accumulation of the probabilities. Thus,
the y-value for the highest x-value must always be 1.
Joint Distribution Function
A joint distribution function F(xi ,yk) = P(X≤xi,,Y≤yk) indicates the probability that X takes at most a value of x and Y at most a value of y. Unlike the previous functions that are based on one
random variable, this (and the following two) functions are used when two random variables exist
Conditional Distribution Function
A conditional distribution function f(xi |Y=yk) describes the distribution of a variable X given the outcome of another variable Y. It is equal to the joint probability of the two variables divided
by the marginal probability of the given variable
Marginal Distribution Function
The marginal distribution function fx(xi) = P(X=xi) indicates the probability of X = xi regardless of the value of Y
Explain the Central Limit Theorem and its Importance in Inductive Statistics
If we take n independent random variables with mean µ and variance σ2, then if n is large the sum of these random variables will be approximately normally distributed with mean nµ and variance nσ2. Thus, even though we might not know the shape of the distribution of the entire population, the central limit theorem says that we can treat the sampling distribution as if it were normal. Of course, in order for the conclusions of the theorem to hold, we do need a sample size that is large enough. The sampling distribution is asymptotically normal only if n ≥ 30. Many practices in statistics, such as those involving hypothesis testing or confidence intervals, make some assumptions concerning the population that the data was obtained from. One assumption that is initially
made in a statistics course is that the populations that we work with are normally distributed
Explain the role of test statistics
The test statistic is used in hypothesis testing. In particular, it is calculated from the sample in order to decide whether the null hypothesis made on the population should be rejected.
Normal Distribution
Normal distribution: it plays a central role in econometrics and in statistics. The central limit theorem allows it to approximate several other distributions by using the normal distribution. Especially the standard normal distribution, as a special case of the normal distribution with mean zero and standard deviation one, is widely used in statistics for hypothesis testing etc. The normal
distribution is symmetric.
Student Distribution
Student distribution: it is, like the normal distribution symmetric but it has fatter tails, which means that extreme outcomes are assigned with higher probability compared to the normal distribution. It is used in statistics when the variance of the population is unknown. For n≥30 it can be approximated by the normal distribution
Chi-Square Distribution
Chi-Square distribution: it is the sum of squares of independently standard normally distributed random variables. The degrees of freedom the chi-square distribution corresponds to the number
of standard normally distributed variables that are summed up.
F-Distribution
F distribution: it takes its name from Fischer. It is the ratio of two independent random variables which follow a chi-square distribution with v1 and v2 degrees of freedom, each divided by its degrees of freeom. The F-distribution is characterized by v1 and v2 degrees of freedom and is use in ANOVA (Analysis of Variance) to test if the populations of two samples have the same variance
Explain log-normal distribution
A random variable is presumed to be lognormal distributed if the natural logarithm of the variable is normally distributed, that is if y = ln(x) is normally distributed (Y~N(µy, σy). The parameters for the lognormal-distribution are determined by the parameters of the normal distributed variables. In contrast to a (symmetric) normal distribution the graph is positively skewed (skewed to the right) and has only positive x-values. If ln(X)=1, then the lognormal-distribution results in the normal-distribution. An example is given by stock prices, which are usually lognormally distributed. When we calculate the
continuous returns by the taking the logarithm of the ratios, we end up with normally distributed returns.
What information does the mean contain?
The mean as a measure of central tendency is the expected value and thus the average value of all outcomes of the random variables. The expected value of a random variable is the value that one would expect to find if he could repeat the process infinite number of times and take the average of the values obtained. The expected value is a linear operator, and in the discrete case it is obtained by summing up the products of each probability with the corresponding value of the
random variable
What information does the variance contain?
The variance measures the dispersion of the distribution. It equals the expected quadratic
deviation of the random variables from the mean and is called the second central moment. The variance has no unit of measurement. It can be calculate as the expectation of the square minus
the square of the expectation. The variance is a nonlinear operator.
What information does the standard deviation contain?
The standard deviation is the square root of the variance and therefore makes it easier to interpret the deviation of the random variables from the mean. It’s also a measure of dispersion
and it is commonly used in finance to express risk
What information does skewness contain?
Skewness is the third central moment and is a measure of the asymmetry of the distribution around its mean. It is defined as S = E((Y-μ)3)/σ3. In the case of:
- Positive skewness: the distribution has a longer tail to the right, the mean is greater than the median, which is turn greater than the mode. The mean is pulled up by the few very high observations
- Negative skewness: the distribution has a longer tail to the left, with mean being below the median and the mode. The mean is pulled down by the few very low observation
What information does kurtosis contain?
Kurtosis is the fourth central moment and measures the peakedness of the distribution with K = E((Y- μ)4)/σ4. Kurtosis of a normal distribution is 3. We distinguish between:
o Platykurtic: quite flat
o Leptokurtic: high and narrow
o Mesokurtic: like a triangle, resembles to a normal distribution
For Example: Leptokurtic distributions are found in asset returns, where there isn’t a continuous trading. Such security markets close over night or at weekends, so the information which has an influence on asset prices but is published when the market is closed will have an impact on prices when the market reopens. This causes a jump between the previous closing price and the opening price. As a result there will be higher frequencies of large negative or positive returns than would
be expected if the market were to trade continuously
Covariances and variances are also expectations.
The variance is the expectation of the square minus the square of the expectation. The covariance is the expectation of the product minus the product of the expectations of the two random
variables
What information does the variance-covariance resp. the correlation contain?
In a variance-covariance matrix the covariances between several pairs of variables are displayed. It contains both the variances (as the covariances of a variable with itself) on the diagonal and the covariances of each variable with another variable everywhere else. Therefore the covariances occur twice in the matrix as cov (x,y) = cov (y,x), meaning the matrix is symmetric.
The correlation coefficient is a unit-free measure of the strength and the direction of a linear relationship between two variables. Contrary to the covariance the size of the correlation coefficient is not influenced by the values of the observations. For Example: a larger covariance could be more to do with having high values to the observations than to a closer association between two variables. It is calculated by dividing the covariance between X and Y by the product of the standard deviation of X and the standard deviation of Y. Correlation implies simply a measure of statistical association as there is no inference of causality in the statistic. It indicates how strong the linear association is between two variables, it cannot explain the changes in them. The larger the value is, the stronger the linear relationship. The coefficient is always between -1 and 1. The correlation is matrix is again a symmetric matrix which contains 1 in the diagonal (because the correlation of a variable with itself is always 1), while the correlations are located in the remaining of the
matrix, which is again symmetric
Explain the statement "Pearson's correlation is a linear dependency measure"
Pearson’s correlation coefficient is calculated by dividing the covariance between X and Y by the product of the standard deviation of X and the standard deviation of Y. It indicates how strong the linear association is between two variables, it cannot explain the changes in them. The larger the value is, the stronger the linear relationship. The coefficient is always between -1 and 1. When the correlation coefficient is -1, then we can say Y = aX + b (meaning that Y is a linear function of X) with a < 0 (negative slope). In the case of a correlation coefficient equal to 1, the slope will be positive
and thus a > 0
Explain the range of correlation measures
The possible value of the correlation coefficient ρ lies between -1 for a perfectly negative relationship, through zero where the two variables are independent of each other, to +1 for a perfectly positive
relationship between the variables
Explain the information covered by marginal distributions and conditional distributions
- The conditional distribution indicates the probability of an event X given the event Y. There is a precondition and under this assumption we calculate the probability of a desired outcome.
- The marginal distribution indicates the probability of X=xi regardless of the value of Y.
How do you compute the variance of a portfolio? What are the advantages and disadvantages of using variance as a risk measure for the portfolio?
We compute the variance of a portfolio of:
--> see picture
The covariance takes into account the fluctuations of the assets which are together. Only if the two assets have a correlation of 1, we can simply sum up all variances, because there is no diversification effect. The benefits of diversification are derived from adding assets to the portfolio that have low or even negative covariances with other assets in the portfolio, thus reducing the sum of the covariances and therefore the total risk of the portfolio.
Disadvantages:
- The variance of a portfolio is based on historical data and therefore it's difficult to select data
- Deviations upwards are also evaluated as a risk
- Can’t be a risk measure when computing dependencies: diversification is related to the correlation coefficient instead of the variance, market risk is related to beta
Advantages:
- Good measure for comparing absolute value fluctuations of different assets/portfolios
- Standard derivation is a risk measure which is easy to interpret
How does the variance of a portfolio evolve with increasing number of assets?
The portfolio variance is determined through: σP 2 = Σ (wi 2*σi 2)+ 2*ΣΣ (wi*wj*covij)
It depends on the covariance and the coefficient of correlation. If all assets behave 100% the same (correlation coefficient of 1), there is no diversification effect and therefore the variances are summed up. If the assets do not behave exactly the same, there is a diversification effect through the covariance respectively the correlation
Whereas the first term is the average of the variances, the second term is the average of all covariances since the number of covariances is N(N-1). If N goes to infinity with an increasing number of assets, the first term goes to zero and the second term goes to the average covariance. Therefore the variance gets smaller than the sum of the single variances. The lower/negative the correlation the smaller the variance gets compared to the sum of the single variances. In general, even with low correlation (not negative), the variance will rise by increasing number of assets, since we add something.
How does the diversification potential of two assets behave with respect to their correlation?
The benefits of diversification are reached by adding an asset with low or negative covariance to the other asset. This reduces the total covariance and therefore reduces total risk. With a correlation coefficient of 1 there is no diversification effect and the portfolio risk is simply the weighted average risk of the individual assets. As asset returns are not perfectly correlated, portfolio risk is calculated by the squared-weighted variances and covariances. The risk in the form of standard deviation of such a portfolio will be less than the weighted average risk of the individual securities as with a correlation of 1. If the correlation coefficient takes a value of 0 and the assets are independent, the portfolio risk is determined by σP2 = Σ (wi 2*σi 2)
With a perfect negative correlation (coefficient of -1) we can create a portfolio with a risk very close to
zero.
What information does the p-value deliver?
The P-Value of a hypothesis test is the probability of getting sample data at least as inconsistent with the null hypothesis as the sample data actually obtained. The smaller (closer to zero) the P-Value, the stronger is the evidence against the null hypothesis. An outcome that would rarely occur if the null hypothesis were true provides evidence against the null hypothesis and, hence, in favor of the alternative hypothesis.
Difference between Inner Product and Outer Product?
a’b is called the inner product (or scalar product) of two vectors where a’ is a row vector and b is a column vector with a’b = b’a. The result is a single number (scalar): (1 x n)*(n x 1) = (1 x 1). The outer
product ab’ of a vector a is a matrix of dimension n x n: (n x 1)*(1 x n) = (n x n).
Explain "orthogonal".
Orthogonal: this term is linked to the inner product of two vectors. Two vectors are orthogonal if a`b=0. Geometrically spoken these two vectors generate an angle of 90°.
Explain "idempotent".
Idempotent: A matrix is called idempotent if the product of the matrix with itself leads again to the same matrix: PP=P. The identity matrix I is an example for a idempotent matrix
Explain "singular matrix".
Singular matrix: a matrix whose column (row) vectors are linearly dependent. The matrix is not of full rank and it is not invertible.
Explain "determinant".
Determinant: determines whether a matrix is invertible or not. If the value of the determinant is zero, the vectors are linearly dependent and the matrix is not invertible. Otherwise, it is. The
determinant of a symmetric matrix is equal to the product of the eigenvalues
Explain "Eigenvector/Eigenvalue"
Eigenvector/Eigenvalue: we consider a combination of a vector c and a scalar λ such that Ac = λc. The lambdas are called eigenvalues and the associated vectors are called the eigenvectors. If
at least one eigenvalue is zero, than the matrix is singular
Explain "positive semidefinite".
Positive semi-definite: A symmetric matrix A is called positive semi-definite if x’Ax≥0. This is the case when all its eigenvalues are non-negative
Explain the Tower Rule.
The expectation of conditional expectation of X is actually equal to the expectation of X itself.
What are random number generators needed for?
A random number generator is needed to generate random variables according to a specific probability distribution (e.g. a normal distribution with a certain mean μ and standard deviation σ). These random variables are then used to do simulations such as the Monte Carlo Simulation. This allows us to reproduce the market dynamics. Even though simulations are easy to apply, they can’t be used for optimization rather for scenario analysis.
There are two very important requirements to be met:
- Law of large numbers: The larger the sample size, the more accurate the frequency distribution will match the probability distribution.
- Independence: The random variables generated should be perfectly independent of eachother.
Advantages of those simulations is large flexibility and a more realistic modelling process. However, it is rather a what-if analysis of scenarios than a mathematical optimization.
How do we test whether a random generator is good or not?
Create several buckets along the x-axis based on a large set of numbers.
Afterwards you check for those buckets if the amount of realizations correspond to the theoretical distribution that is being imitated.
If yes, then the random generator is good.
Application of the linear congruential method for the generation of random numbers.
The linear congruential method allows to generate independently [0,1]-uniformely-distributed random variables. Based on:
- An initial value x0 (the seed)
- A constant multiplier a
- An increment c
- A modulus m
We get random variables according to following formula:
--> see picture
The random numbers are between zero and one because a residual divided by the divisor is always smaller than 1. It is important to note that the determination of the initial value x0, the so called “seed”, is crucial as this value always leads to the same sequence of random variables. And a random number generator is only a good one, if the sequence of numbers in which the random numbers don’t repeat themselves is as large as possible. For example, a sequence of random numbers that repeats over and over again after 20 numbers is surely not the result of a good generator since it can’t be described as really random or stochastic. Also the modulus m plays a role when determining the formula for generating random numbers since it determines the residual
Application of the Inverse Transformation Method
The ITM is a transformation method among other methods. It is needed to transform independently [0,1] uniformely-distributed random variables into a desired distribution. The ITM is mainly used whenever the cumulative distribution function F(x) is given in closed form or when it can be calculated as the integral of the probability density function f(x). One has to generate independently [0,1]-uniformely-distributed random variables r and to solve F(x) = r for x. This gives us random variables x with the probability density function f(x).
Explain the Convolution Algorithm
Based on the central limit theorem. Using the standardization, identically-distributed random variables Ri can be converted into standard normally distributed random variables Zi. After that, these random variables can be transformed into a normal distribution with mean μ and standard deviation σ instead of N(0,1) by a z-transformation (X=μ+σZ). For n = 12 the standard normally distributed random variables are easily computed with the formula: Z = Σ(Ri) - 6.
The summation is done since all Ri will together, in the sum, be approximately normally distributed.