QM
HS 17/18
HS 17/18
Set of flashcards Details
Flashcards | 82 |
---|---|
Language | English |
Category | Finance |
Level | University |
Created / Updated | 04.01.2018 / 04.01.2018 |
Weblink |
https://card2brain.ch/box/20180104_qm
|
Embed |
<iframe src="https://card2brain.ch/box/20180104_qm/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Create or copy sets of flashcards
With an upgrade you can create or copy an unlimited number of sets and use many more additional features.
Log in to see all the cards.
Which parameter is used to quantify the explanatory power? What interpretation does this parameter allow?
The parameter in use is R-squared, which is referred to as the coefficient of determination. It is defined as follows (SST = Total Variation / SSR = Explained Variation / SSE = Unexplained Variation):
--> see picture
The R-squared should, however, not be taken too seriously in absolute terms, because it is often overrated. A value of 0.2 does not have to be useless, but also a value of 0.95 can be quite low in certain situations. Moreover, it is not possible for the R-squared to be lowered because of an additional variable. If this variable turns out to be absolutely useless, it will just not appear in the model, but the existing factors will not have less explanatory power because of that. A measure which would actually lower the value with additional independent variables is the ADJUSTED R2, which is therefore a more accurate measure.
What is the link between the standard deviation of the error terms and the explanatory power?
The error terms are directly linked to the SSE, since the latter is the sum of all squared errors. The larger the error terms, the larger the SSE becomes and the smaller the R-squred. Error terms are basically variation that is not explained by the regression model.
Description of a return model using linear regression. Explain the difference between the systematic risk and unsystematic risk. Which of them can be diversified? Which of them is compensated on the market?
Regression model of returns: μ = ß0 + ß1* μm + ε.
The market model implies that a stock return μ is linearly dependent on the equity market (represented by the return on market portfolios μm). Systematic risk is based on the market and measures the volatility of the asset price which is affected but also compensated by the market. Therefore the coefficient ß1 is called the stock’s beta coefficient, which measures how sensitive the stock’s rate of return is to changes in the level of the overall market.
Unsystematic risks are the so called firm-specific risks and are only a result of activities and events of one corporation and therefore they can be diversified. Thus, they can be regarded as non compensated risk. If an investor expects the markets to rise, it makes sense to hold a portfolio with a ß > 1. If he expects the markets to fall, it makes sense to hold a portfolio with a ß < 1. The systematic risk is measured by the coefficient of determination R2 and the unsystematic risk as a
consequence by 1-R2.
What information do the values of the regression output contain?
The regression table is usually divided into three parts.
The first delivers absolute values of the regression model, containing R, R-squared, adjusted R-squared, SEE (Standard Error of estimate, this is similar to the square root of the MSE) and number of observations.
The second part contains the ANOVA, which is an F-test considering the validity of the entire model, containing information about the SSE, SST and SSR.
The last part is considering single coefficients and their significance with a t-test.
Explain the meaning as well as the difference between prediction interval and confidence interval? What is the link between both intervals? Explain their non-linear characteristics.
The PREDICTION INTERVAL gives an interval of where the next observation will be located. This is usually a larger interval, because extreme events are likely to happen. For a 95% prediction interval, only one in 20 observations should be outside the borders.
The CONFIDENCE INTERVAL is usually smaller and it aims for the expected value. The CI gives the borders of where the actual optimal regression line of the population will probably be located. Due to aggregation (law of large numbers), better predictions can be made about the mean than of a single realization. Both measures are due to the uncertainty of the sample.
Both of those intervals experience an outward curvature. This is due to the fact that the predictions are the most precise around the mean of x. The farther out you go, the more unprecise the estimate becomes. Moreover, both intervals decrease with larger sample sizes n, because of the law of large numbers. The more observations have already been realized, the better future predictions become.
Explain the link between correlation and regression coefficient. efficients for significance? What are the preconditions for this test?
In the case of a single regression, the correlation between dependent and independent variable is equal to the square root of R-squares. However, the correlation does not actually make a valid point about the regression model. R-squared is a more important measure for this.
The correlation test determines a correlation between both variables. However, the test usually has very similar results to the t-test of a regular regression coefficient (beta), because it also measures the linear relationship between variables. The hypothesis is denoted as H0: ρ = 0. If there is no linear relationship between variables, the (linear) correlation coefficient rho is also zero.
Explain heteroscedasticity / homoscedasticity?
Homoscedasticity is one of the Gauss-Markov conditions for residuals, which has to be kept. It states that the variance of error terms is constant throughout all values of x. If the variance changes, the residuals are heteroscedastic. The easiest way to check for heteroscedasticity is looking at a plot of error terms. A homoscedastic model will display a uniform cloud of dots, whereas heteroscedasticity will result in patterns such as a funnel shape, indicating greater error as the dependent variable increases.
Explain the role of "Sum of squares" (SST, SSTR, SSE) and their averages (MSTR, MSE).
SST, SSTR and SSE are Sum of Squares whereas MSTR and MSE are Mean Squares and hence the averages of the Sum of Squares. Basically the functions of calculating the F-value and therefore testing if the model has a good implication. In detail these are:
- SST: Sum of Squares Total, the total variation of y
- SSTR (= SSR): Sum of Squares for Regression, the variation which is explained by the regression mode.
- SSE: Sum of Squares for Error, the unexplained variation
- MSTR (= MSR): Mean Square for Regression, MTSR= SSR / k
- MSE: Mean Square for error, MSE = SSE / (n-k-1)
This implies: SST = SSTR + SSE. Basis for the mean of the Squares are the degrees of freedom. For the regression the degrees of freedom corresponds to k, the number of all estimated parameters. For the residual the degrees of freedom equate to (n-k-1). Finally the F value in ANOVA is calculated by dividing MSTR by MSE.
Explain the test statistic as well as the test distribution for the Fischer-test within the linear regression analysis.
The F-test does not just look at a single coefficient, but at the explanatory power of the entire model. However, because there is no t-distribution, we assume the Fisher distribution.
The F-test is also more «robust» than a t-test. In contrast to the t-test, it has a lower alpha-error (stating an invalid model as valid is lower) and it is unaffected by multicollinearity, because it looks at the total SSE and not the contributions of each coefficient.
--> While the t-test only looks at one coefficient, the F-test looks at all of them simultaneously.
Explain the CAPM from the regression viewpoint
The Capital Asset Pricing Model states that the return on a stock is related to the return on the market through the following linear relationship: --> see picture
In here, et is the residual error. If the CAPM does explain the relation between the returns of the stock and that of the market, the intercept α should be zero. The coefficient β is a measure of the systematic or non-diversifiable risk. The higher it is, the more sensitive the stock is to the market. High beta stocks have more systematic risk than low beta stocks. The CAPM implies economically is that since high beta stocks have more risk, they should also earn higher returns.
What is the link between the sum of squared errors, the standard deviation of the error terms, the coefficient of determination and the test statistic of the Fischer-Test.
The Sum of Squared Errors (SSE) corresponds directly to the standard deviation of the error term and affects the coefficient of determination R2 as well as the F-value. When the SSE is large, this implies also a large sε but therefore a small R2 and a small F value. In such instance the explanatory power of the regression model is poor. The smaller the SSE the better the explanatory power of the model. With a SSE and sε of 0 the regression model describes the relation of the dependent and the independent variables perfectly.
Explain the significance tests for the coefficients?
We test the regression coefficients to check whether the intercept coefficient is significantly different from zero and if the slope coefficients are significantly different from zero too. The regression coefficient is estimated and is therefore affected by some uncertainty. This uncertainty is measured by the standard error of each coefficient. With this information three different approaches exist to test whether the intercept is significantly different from zero:
- By dividing the estimated coefficient by its standard error we get the t-ratio aka the t-statistics. It tells us how many standard error units the coefficient is away from zero. As a rule of thumb, a tstatistic with an absolute value larger than 1.96 means that the corresponding coefficient is statistically different from zero. It is more than two standard deviations away from its mean.
- We can look at the p-value. If it is small enough, then we accept the idea that the corresponding coefficient is statistically different from zero
- We can build a 95% confidence interval by adding plus/minus 1.96 times the standard error to the estimated coefficient. If zero is not included in this interval, this means that we can exclude the fact that the corresponding coefficient may be equal to zero
ALL these approaches are founded on the hypothesis of normally distributed errors with constant variance.
Explain “multicollinearity” and its assessment. How do we deal with “multicollinearity”?
Multicollinearity occurs, if independent variables x are correlated among each other. Due to the ceteris paribus condition, the t-test only measures the impact if one variable changes. However, because this is unlikely due to high correlation, the t-test of correlated coefficients gives a very low result and might even reject all correlated factors as insignificant. In contrast to the t-test, the F-test cannot be tricked by multi-collinearity.
--> If the model has a very high F-value, but the factors are mostly insignificant, this is a good indicator of multicollinearity.
To compensate for multicollinearity, you have to try and neglect single independent variables or start with one and continue adding. It is an art form to find a model which has a large R-squared, but no autocorrelation.
Explain the conceptual background of the Durbin-Watson test
The main objective of the Durbin-Watson test is to determine first-order autocorrelation among the residuals. This means it looks at the deviation from yesterday’s residual and looks for patterns. The resulting value d can take on values between 0 and 4.
A value of 2 implies no autocorrelation whatsoever. A significantly larger value indicates negative autocorrelation and a value close to zero implies strong positive autocorrelation.
About the Kuhn-Tucker Approach
There are some major differences between the Lagrange and the Kuhn-Tucker approach, even though both represent a saddle function. The largest of them being the inequalities in the Kuhn-Tucker restrictions. It is important that all the restrictions have to use a ≤ in order to define the set of feasible decisions.
Another difference is, that the multipliers are non-negative. In the world of Lagrange, they can only be negative, but for Kuhn-Tucker, it would not make sense.
This leads to the Complementary Condition: Either the multiplier or the restriction have to be equal to zero. If the multiplier is zero, the restriction is not binding and the optimal solution is not on its border. Therefore, the derivative on the border of the restriction is not zero. If the multiplier is positive, the restriction has actual value of the decision maker. Hence, the optimum is located on the border of said restriction.
--> Kuhn-Tucker only works for maximization problems. If there is a minimization problem, you need to switch signs and multiply by (-1).
Probability Function
The probability function for a discrete random variable X is a function p(x) that assigns a probability to each value of the random variable. It must be greater than or equal to zero and the
sum of all individual outcome probabilities is equal to 1.
Probability Density Function
The probability density function corresponds to the probability function, but in the continuous case. It is defined by the derivative of the Cumulative Distribution Function (CDF). In the continous case, the probability of a single realization is always zero, and we need to express the probability
by taking the integral in an interval under the curve
Cumulative Distribution Function
The cumulative distribution function (CDF) F(x) = P(X≤xi) indicates the probability that X takes maximally a value of x. It is called cumulative since it’s the accumulation of the probabilities. Thus,
the y-value for the highest x-value must always be 1.
Joint Distribution Function
A joint distribution function F(xi ,yk) = P(X≤xi,,Y≤yk) indicates the probability that X takes at most a value of x and Y at most a value of y. Unlike the previous functions that are based on one
random variable, this (and the following two) functions are used when two random variables exist
Conditional Distribution Function
A conditional distribution function f(xi |Y=yk) describes the distribution of a variable X given the outcome of another variable Y. It is equal to the joint probability of the two variables divided
by the marginal probability of the given variable
Marginal Distribution Function
The marginal distribution function fx(xi) = P(X=xi) indicates the probability of X = xi regardless of the value of Y
Explain the Central Limit Theorem and its Importance in Inductive Statistics
If we take n independent random variables with mean µ and variance σ2, then if n is large the sum of these random variables will be approximately normally distributed with mean nµ and variance nσ2. Thus, even though we might not know the shape of the distribution of the entire population, the central limit theorem says that we can treat the sampling distribution as if it were normal. Of course, in order for the conclusions of the theorem to hold, we do need a sample size that is large enough. The sampling distribution is asymptotically normal only if n ≥ 30. Many practices in statistics, such as those involving hypothesis testing or confidence intervals, make some assumptions concerning the population that the data was obtained from. One assumption that is initially
made in a statistics course is that the populations that we work with are normally distributed
Explain the role of test statistics
The test statistic is used in hypothesis testing. In particular, it is calculated from the sample in order to decide whether the null hypothesis made on the population should be rejected.
Normal Distribution
Normal distribution: it plays a central role in econometrics and in statistics. The central limit theorem allows it to approximate several other distributions by using the normal distribution. Especially the standard normal distribution, as a special case of the normal distribution with mean zero and standard deviation one, is widely used in statistics for hypothesis testing etc. The normal
distribution is symmetric.
Student Distribution
Student distribution: it is, like the normal distribution symmetric but it has fatter tails, which means that extreme outcomes are assigned with higher probability compared to the normal distribution. It is used in statistics when the variance of the population is unknown. For n≥30 it can be approximated by the normal distribution
Chi-Square Distribution
Chi-Square distribution: it is the sum of squares of independently standard normally distributed random variables. The degrees of freedom the chi-square distribution corresponds to the number
of standard normally distributed variables that are summed up.
F-Distribution
F distribution: it takes its name from Fischer. It is the ratio of two independent random variables which follow a chi-square distribution with v1 and v2 degrees of freedom, each divided by its degrees of freeom. The F-distribution is characterized by v1 and v2 degrees of freedom and is use in ANOVA (Analysis of Variance) to test if the populations of two samples have the same variance
Explain log-normal distribution
A random variable is presumed to be lognormal distributed if the natural logarithm of the variable is normally distributed, that is if y = ln(x) is normally distributed (Y~N(µy, σy). The parameters for the lognormal-distribution are determined by the parameters of the normal distributed variables. In contrast to a (symmetric) normal distribution the graph is positively skewed (skewed to the right) and has only positive x-values. If ln(X)=1, then the lognormal-distribution results in the normal-distribution. An example is given by stock prices, which are usually lognormally distributed. When we calculate the
continuous returns by the taking the logarithm of the ratios, we end up with normally distributed returns.
What information does the mean contain?
The mean as a measure of central tendency is the expected value and thus the average value of all outcomes of the random variables. The expected value of a random variable is the value that one would expect to find if he could repeat the process infinite number of times and take the average of the values obtained. The expected value is a linear operator, and in the discrete case it is obtained by summing up the products of each probability with the corresponding value of the
random variable
What information does the variance contain?
The variance measures the dispersion of the distribution. It equals the expected quadratic
deviation of the random variables from the mean and is called the second central moment. The variance has no unit of measurement. It can be calculate as the expectation of the square minus
the square of the expectation. The variance is a nonlinear operator.
-
- 1 / 82
-