QM

HS 17/18

HS 17/18


Set of flashcards Details

Flashcards 82
Language English
Category Finance
Level University
Created / Updated 04.01.2018 / 04.01.2018
Weblink
https://card2brain.ch/box/20180104_qm
Embed
<iframe src="https://card2brain.ch/box/20180104_qm/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Explain the parameters of a multivariate normal distribution

An N-dimensional random variable X is normally distributed with:

  • The mean ξ [pronounced: Xi] is the vector of the means of all variables and
  • The variance-covariance matrix Σ (consisting of all covariance-combinations of the variables including with themselves (=variance)).

Interpret the density function of a multivariate normal distribution

In the multivariate case, I have a vector of expectations and the variance-covariance matrix. The Ndimensional random variable X has then a density function if the variance-covariance matrix is nonsingular, because in the density formula we are taking the inverse of it. A matrix needs to be regular
in order to be invertible.

What property does the variance-covariance matrix have? How is this property proven?

The matrix Σ is certainly positive semi-definite. This means that there are no negative elements on the diagonal. Intuitively this can be proven by the fact that variances are never negative. It is proven by the fact that there is no vector γ, which makes the following expression false:
��L���� ≥ 0

In the script, this proof is made with the definition Var[X] = E[X2 ] – E[X]2 . The bottom line is, that the result contains a term which is squared. Therefore, this term and γ cannot become negative in any case. The positive semi-definite attribute of the variance-covariance matrix allows for a Cholesky-Decomposi-
tion.

What is the Cholesky Decomposition good for?

The Cholesky-decomposition is an algorithm that allows us to split the variance-covariance matrix Σ into a matrix L that, multiplied with itself (respectively L’), gives Σ. Therefore Cholesky-decomposition means taking the square root of a matrix.

This is a crucial stop in transforming multivariate standard normal distribution into any normal distribution desired.

It is only possible if the matrix is positive semi-definite.

Determine the variance-covariance matrix for a given Cholesky Decomposition

The matrix L, which is the result of the Cholesky-decomposition, has to be multiplied with itself (respectively its transpose L’) in order to get the variance-covariance matrix: LL’ = Σ. This is logic as Cholesky-decomposition means to take the square root of a matrix, especially of the variance-covariance
matrix

Computation of normally distributed realizations using a given Cholesky decomposition

To get a vector X of random variables Xi underlying a certain normal distribution we have to multiply the Cholesky-decomposition matrix L with the vector Z of the standard normally distributed random variables Zi (with mean zero and standard deviation of 1) and add the vector ξ: X = ξ + LZ.


The resulting random variables are normally distributed with mean ξ and variance-covariance matrix Σ.

Explain how to verify the convexity of a nonlinear optimization problem

The convexity of an optimization problem ensures that a local extremum is also a global one. To determine whether an optimization problem is convex we need to take a look at the objective function on one hand and at the restrictions on the other hand. The optimization problem is convex if the set of feasible decisions is convex and the objective function is concave in a maximization problem and convex in a minimization problem, respectively. If one condition is violated we work with non-convex problems.

  • The objective function is convex (concave) when the Hesse-matrix is positive (negative) semidefinite. This corresponds to a positive (negative) curvature in any direction.
  • The feasible set is convex if the function g fulfils certain requirements.

Graphically spoken: a set is convex when for any two points of the set any convex combination of these two points is in the set too.

Two dimensional case: Positive (negative) semi definite in the two dimensional case means the matrix has non negative (negative) diagonal elements and its determinant is also non negative.

Explain how to verify convexity / concavity of multidimensional functions?

When the Hesse-matrix is positive (negative) semi-definite. This corresponds to a positive (negative) curvature in any direction.

Explain the concept of the Lagrange approach.

The Lagrange approach helps us to solve an optimization problem with constraints as if it had no constraints (constrained optimization becomes unconstrained optimization). This can be achieved by combining the objective function and the restrictions to only one function, the Lagrange function. As a consequence, we only need to set all partial derivatives of the Lagrange function equal to zero. The price is complexity, since the Lagrange function has more dimensions than the original optimization problem. The reason for this lies in the introduction of the dual variables, the so called Lagrange multipliers. With the Lagrange Approach the constraints are stated as equalities (as opposed to the Kuhn-Tucker-Approach which allows for inequalities).

Interpret the Lagrange function of an optimization problem and the “Lagrange multipliers”.

The Lagrange function of a convex optimization problem is a saddle function. In the case of a maximization problem the function is concave in x and convex in λ. And if the minimization problem is convex then the Lagrange function L(x,λ ) is convex in x and concave (affine) in λ. The optimal solution lies in the saddle point. We can see the graph as either a collection of concave curves (when we fix the function for a certain λ) or as a collection of convex curves (when we fix the function for a certain x). If we determine the maximum / minimum of all these curves we get a maximum value function and a minimum value function. They intersect at the optimal solution.

The Lagrange function consists of the objective function plus the sum of all constraints, each multiplied by a corresponding Lagrange multiplier. Since the constraints which are summed up in the Lagrange approach have different units, the λ multipliers transfer these to a single “currency”, which is the unit of the objective function.

The Lagrange multipliers serve for sensitivity analysis and can also be interpreted as shadow prices. They are used as a measure of sensitivity of changed in the restriction. Therefore the saddle point can be seen as equilibrium between market and decision maker. In this context, we refer to the strong duality concept: where minimum of the maximum value fuction and maximum of the minimum value function intersect.

Explain the Lagrange- / the Kuhn-Tucker optimality conditions of order 1 / 2? Are these conditions necessary and/or sufficient for the determination of an optimal solution?

For this question, it is very important to determine the convexity of a function.

  • If function is convex in its Hessian matrix, we know that there can only be one solution. Therefore it is sufficient to just set all the derivatives of the Lagrange-function to zero, which is the first order condition. This condition is necessary and also sufficient for convex functions.
  • However, if the fuctions are non-convex, we also need the second order conditions to check if a local extremum is also a global one: look at second derivative!

Explain the optimization problem in the mean-Variance approach and its structural property.

In a first step, the Mean-Variance Approach is minimizing the portfolio-variance at any given portfolio return.

There are two restrictions made:

  1. All weights need to sum up to one
  2. The retun is defined as the weighted return of all assets

This gives us the efficient frontier, which is the upper part of the function σ(μ). However, μ is not defined as a fixed number yet. This step shows, how each assets has to be weighted, in order to get the minimum variance for any targeted μ.

In a next step, the variance function σ(μ) is minimized. The result is the minimum-variance portfolio.

Explain the structural properties of the efficient frontier / efficient portfolios in the mean-variance approach.

The efficient portfolios can be drawn in a straight line, because they have a linear dependency on a given μ. This means, that if you know two optimal portfolios, you can derive any possible optimal porfolio.

The efficient frontier is curved, because it is not linearly dependent on μ, but has a squared function.

Explain how to verify the optimality of a given solution of an optimization problem.

We have to check whether the target function and the restrictions fulfill the requirements of a convex optimization problem. If they do so, every local minimum/maximum is a global one as well, which makes
the solution optimal.

Explain the dynamic version of the mean-variance approach and its consistency with utility theory. Why is consistency an issue?

The whole planning horizon is divided into several periods. The idea of the dynamic version is that you can rebalance you portfolio after a predetermined time horizon. You will rebalance you portfolio to get your predefined target return with a minimum of risk. This is consistent with utility theory, since we are in general risk averse and prefer more to less wealth and because we model risk through variance.

It can be shown that minimizing variance for a predefined target return is equal to maximizing utility which means, that the DEVA (Dynamic Expectation Variance Analysis) is consistent with utility theory. The utility function U(W) is defined as wealth minus the variance of the wealth times a risk aversion factor α.

If we compare the Lagrangian of the mean-variance approach to the Lagrangian of the utility approach, we see that the functions look exactly the same with the exception of the risk-aversion factor (α and λ, respectively) and the constant factors of the mean-variance Lagrangian which has no influence on the saddle point. Given that the risk aversion factors represent the same risk aversion, the saddle point is exactly at the same location, independent which approach is used.

Explain the update of the variance-covariance matrix in the dynamic setting

In the dynamic setting, it is assumed that there is a regime of low and a regime of high volatility.

They do not occur at the same time and the frame of observation is over different time periods. The idea behind this is that the volatility of markets change over time. With different variance-covariance matrices, one can better adapt his portfolio to a given point in time.

Explain how to read P&L-distributions of asset allocations? Explain “stochastic dominance”.

The P&L distribution shows us for different allocations the resulting total return and shortfall probability. We have first-order stochastic dominance is the state where portfolio A has both higher mean and lower volatility than portfolio B. We define second-order stochastic dominance as the state where portfolio A has at least as high a mean as portfolio B but less volatility. Based on the risk-aversion of the investor or on the restrictions (risk ability) imposed on the asset manager (e.g. less than 5% shortfall probability), then we prefer one allocation over another.

Given a filtration, what properties should a dynamic decision process fulfill?

The shortfall probability of a given filtration should always be lower than a predefined level, otherwise we can’t implement the strategy and have to take out the volatility.

Explain "dependent variable".

Dependent variable: endogenous variable which is determined by some other, independent variables and is called y in the regression model

Explain "independent variable".

Independent variable: exogenous variable defined as x in the simple linear regression. (Example: relation between sales (y) and sales area (x))

Explain "error term".

Error term: residual which describes the deviation between the actual data point and the calculated regression line (yi - ŷ). It covers influences which aren’t explained by the regression model and therefore represents the random term. In contrast to deterministic models, where there is no error variable, probabilistic models add an error term to measure the error of the deterministic component

Explain "parameter of population".

Parameter of population: ß0 and ß1 are parameters of the population. They correspond to the true parameters and are normally unknown

Explain "parameter of the sample".

Parameter of the sample: b0 and b1 are parameter of the sample. In the regression model they are estimators which are based on the calculated straight line through our sample data. b0 and b1 are
unbiased estimators of ß0 und ß1

How are the coefficients of a linear regression model determined? What can you say about the sum of the squared errors?

The coefficients of a linear regression model are b0, b1, … bn, whereby b0 is the y-intercept and b1, … bn are slope parameters. Every bi (for i ≥ 1) relates to a certain independent variable xi. The coefficients are determined by minimizing the sum of squared deviations between the actual data points and the regression line (yi – ŷi)2. The sum of the squared deviations represents the so called Sum of Squared Errors (SSE).

SSE is an important statistic because it is the basis for other statistics that assess how well the linear model fits the data. It is possible to compute the standard error of estimate (sε) from the SSE. sε has to be as little as possible and helps to evaluate different models. Nevertheless it cannot be used as an absolute measure since it has no upper limit. Furthermore it can be used to calculate the coefficient of determination (R2).

 

What are the "Short-Cut" formulas good for?

Short-Cut formulas make the calculation of those parameters a lot easier and they save computer power. Moreover, they can prevent computational errors due to minor inaccuracy by reducing the necessary steps to generate an answer.

What preconditions do the error terms need to fulfill, in order to have a sufficiently good regression model? How can you verify them?

In order to provide a good estimation for the regression model, the distribution of the error term has to follow the MARKOV-GAUSS CONDITIONS:

  1. Expected Value is zero
  2. The error terms are independent of the underlying dependent variable
  3. Homoscedasticity: The variance remains constant for all error terms
  4. There is no autocorrelation between the error terms

The best way to determine if those conditions are met is by plotting the residuals on the dependent variable. This way, any discrepancies can be spotted more easily. If one of those conditions is violated, you can fix it through a monotonous transformation: Log Transformation, quadratic transformation, radical transformation (root), reciprocal transformation.

How do you test the coefficients for significance? What is the starting null hypothesis? Justify the choice of the null hypothesis.

The null hypothesis is always ��0: ��i = 0. This is because the rejection of a hypothesis is more powerful than the confirmation of the opposite hypothesis. Thereby, a coefficient is considered significant, if he remarkably differs from 0. The test itself consists of a simple t-test, with the test statistic:

 

Which parameter is used to quantify the explanatory power? What interpretation does this parameter allow?

The parameter in use is R-squared, which is referred to as the coefficient of determination. It is defined as follows (SST = Total Variation / SSR = Explained Variation / SSE = Unexplained Variation):

--> see picture


The R-squared should, however, not be taken too seriously in absolute terms, because it is often overrated. A value of 0.2 does not have to be useless, but also a value of 0.95 can be quite low in certain situations. Moreover, it is not possible for the R-squared to be lowered because of an additional variable. If this variable turns out to be absolutely useless, it will just not appear in the model, but the existing factors will not have less explanatory power because of that. A measure which would actually lower the value with additional independent variables is the ADJUSTED R2, which is therefore a more accurate measure.

What is the link between the standard deviation of the error terms and the explanatory power?

The error terms are directly linked to the SSE, since the latter is the sum of all squared errors. The larger the error terms, the larger the SSE becomes and the smaller the R-squred. Error terms are basically variation that is not explained by the regression model.

Description of a return model using linear regression. Explain the difference between the systematic risk and unsystematic risk. Which of them can be diversified? Which of them is compensated on the market?

Regression model of returns: μ = ß0 + ß1* μm + ε.

The market model implies that a stock return μ is linearly dependent on the equity market (represented by the return on market portfolios μm). Systematic risk is based on the market and measures the volatility of the asset price which is affected but also compensated by the market. Therefore the coefficient ß1 is called the stock’s beta coefficient, which measures how sensitive the stock’s rate of return is to changes in the level of the overall market.

Unsystematic risks are the so called firm-specific risks and are only a result of activities and events of one corporation and therefore they can be diversified. Thus, they can be regarded as non compensated risk. If an investor expects the markets to rise, it makes sense to hold a portfolio with a ß > 1. If he expects the markets to fall, it makes sense to hold a portfolio with a ß < 1. The systematic risk is measured by the coefficient of determination R2 and the unsystematic risk as a
consequence by 1-R2.

What information do the values of the regression output contain?

The regression table is usually divided into three parts.

The first delivers absolute values of the regression model, containing R, R-squared, adjusted R-squared, SEE (Standard Error of estimate, this is similar to the square root of the MSE) and number of observations.

The second part contains the ANOVA, which is an F-test considering the validity of the entire model, containing information about the SSE, SST and SSR.

The last part is considering single coefficients and their significance with a t-test.

Explain the meaning as well as the difference between prediction interval and confidence interval? What is the link between both intervals? Explain their non-linear characteristics.

The PREDICTION INTERVAL gives an interval of where the next observation will be located. This is usually a larger interval, because extreme events are likely to happen. For a 95% prediction interval, only one in 20 observations should be outside the borders.


The CONFIDENCE INTERVAL is usually smaller and it aims for the expected value. The CI gives the borders of where the actual optimal regression line of the population will probably be located. Due to aggregation (law of large numbers), better predictions can be made about the mean than of a single realization. Both measures are due to the uncertainty of the sample.

Both of those intervals experience an outward curvature. This is due to the fact that the predictions are the most precise around the mean of x. The farther out you go, the more unprecise the estimate becomes. Moreover, both intervals decrease with larger sample sizes n, because of the law of large numbers. The more observations have already been realized, the better future predictions become.

Explain the link between correlation and regression coefficient. efficients for significance? What are the preconditions for this test?

In the case of a single regression, the correlation between dependent and independent variable is equal to the square root of R-squares. However, the correlation does not actually make a valid point about the regression model. R-squared is a more important measure for this.


The correlation test determines a correlation between both variables. However, the test usually has very similar results to the t-test of a regular regression coefficient (beta), because it also measures the linear relationship between variables. The hypothesis is denoted as H0: ρ = 0. If there is no linear relationship between variables, the (linear) correlation coefficient rho is also zero.

Explain heteroscedasticity / homoscedasticity?

Homoscedasticity is one of the Gauss-Markov conditions for residuals, which has to be kept. It states that the variance of error terms is constant throughout all values of x. If the variance changes, the residuals are heteroscedastic. The easiest way to check for heteroscedasticity is looking at a plot of error terms.  A homoscedastic model will display a uniform cloud of dots, whereas heteroscedasticity will result in patterns such as a funnel shape, indicating greater error as the dependent variable increases.

Explain the role of "Sum of squares" (SST, SSTR, SSE) and their averages (MSTR, MSE).

SST, SSTR and SSE are Sum of Squares whereas MSTR and MSE are Mean Squares and hence the averages of the Sum of Squares. Basically the functions of calculating the F-value and therefore testing if the model has a good implication. In detail these are:

  • SST: Sum of Squares Total, the total variation of y
  • SSTR (= SSR): Sum of Squares for Regression, the variation which is explained by the regression mode.
  • SSE: Sum of Squares for Error, the unexplained variation
  • MSTR (= MSR): Mean Square for Regression, MTSR= SSR / k
  • MSE: Mean Square for error, MSE = SSE / (n-k-1)

This implies: SST = SSTR + SSE. Basis for the mean of the Squares are the degrees of freedom. For the regression the degrees of freedom corresponds to k, the number of all estimated parameters. For the residual the degrees of freedom equate to (n-k-1). Finally the F value in ANOVA is calculated by dividing MSTR by MSE.

Explain the test statistic as well as the test distribution for the Fischer-test within the linear regression analysis.

The F-test does not just look at a single coefficient, but at the explanatory power of the entire model. However, because there is no t-distribution, we assume the Fisher distribution.

The F-test is also more «robust» than a t-test. In contrast to the t-test, it has a lower alpha-error (stating an invalid model as valid is lower) and it is unaffected by multicollinearity, because it looks at the total SSE and not the contributions of each coefficient.

--> While the t-test only looks at one coefficient, the F-test looks at all of them simultaneously.

 

Explain the CAPM from the regression viewpoint

The Capital Asset Pricing Model states that the return on a stock is related to the return on the market through the following linear relationship: --> see picture

In here, et is the residual error. If the CAPM does explain the relation between the returns of the stock and that of the market, the intercept α should be zero. The coefficient β is a measure of the systematic or non-diversifiable risk. The higher it is, the more sensitive the stock is to the market. High beta stocks have more systematic risk than low beta stocks. The CAPM implies economically is that since high beta stocks have more risk, they should also earn higher returns.

What is the link between the sum of squared errors, the standard deviation of the error terms, the coefficient of determination and the test statistic of the Fischer-Test.

The Sum of Squared Errors (SSE) corresponds directly to the standard deviation of the error term and affects the coefficient of determination R2 as well as the F-value. When the SSE is large, this implies also a large sε but therefore a small R2 and a small F value. In such instance the explanatory power of the regression model is poor. The smaller the SSE the better the explanatory power of the model. With a SSE and sε of 0 the regression model describes the relation of the dependent and the independent variables perfectly.

Explain the significance tests for the coefficients?

We test the regression coefficients to check whether the intercept coefficient is significantly different from zero and if the slope coefficients are significantly different from zero too. The regression coefficient is estimated and is therefore affected by some uncertainty. This uncertainty is measured by the standard error of each coefficient. With this information three different approaches exist to test whether the intercept is significantly different from zero:

  • By dividing the estimated coefficient by its standard error we get the t-ratio aka the t-statistics. It tells us how many standard error units the coefficient is away from zero. As a rule of thumb, a tstatistic with an absolute value larger than 1.96 means that the corresponding coefficient is statistically different from zero. It is more than two standard deviations away from its mean.
  • We can look at the p-value. If it is small enough, then we accept the idea that the corresponding coefficient is statistically different from zero
  •  We can build a 95% confidence interval by adding plus/minus 1.96 times the standard error to the estimated coefficient. If zero is not included in this interval, this means that we can exclude the fact that the corresponding coefficient may be equal to zero

ALL these approaches are founded on the hypothesis of normally distributed errors with constant variance.

Explain “multicollinearity” and its assessment. How do we deal with “multicollinearity”?

Multicollinearity occurs, if independent variables x are correlated among each other. Due to the ceteris paribus condition, the t-test only measures the impact if one variable changes. However, because this is unlikely due to high correlation, the t-test of correlated coefficients gives a very low result and might even reject all correlated factors as insignificant. In contrast to the t-test, the F-test cannot be tricked by multi-collinearity.

--> If the model has a very high F-value, but the factors are mostly insignificant, this is a good indicator of multicollinearity.


To compensate for multicollinearity, you have to try and neglect single independent variables or start with one and continue adding. It is an art form to find a model which has a large R-squared, but no autocorrelation.