Analysis of Sequential Data
MSE Module TSM_AnSeqDa
MSE Module TSM_AnSeqDa
Kartei Details
Karten | 96 |
---|---|
Sprache | Deutsch |
Kategorie | Mathematik |
Stufe | Universität |
Erstellt / Aktualisiert | 17.01.2021 / 08.09.2023 |
Weblink |
https://card2brain.ch/box/20210117_tsmanseqda
|
Einbinden |
<iframe src="https://card2brain.ch/box/20210117_tsmanseqda/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
What can we forecast?
- Sells of pills/medicine
- Elextricity demand and availability
- Weather
- Sells of product/service
- Customer churns
Which factors affect forecastability?
- Some thing is easier to forecast if:
- we have a good understanding of the factors that contribute to it
- There is lots of data available
- the forecasts cannot affect the thing we are trying to forecast
- There is relatively low natural/unexplainable random variation
- the future is somewhat similar to the past
What are time series data?
- Daily stock prices
- Monthly rainfall
- Annual business profits
- Production, e.g. quartely australien beer production
What is forecasting about?
Forecasting is estimating how the sequence of observations will continue into the future
What do wee need to add to the forecast?
An uncerainty range
Why is providing an uncertainty of the forecast in forecasting important?
If you have just the probabilty (e.g. 50%) you do not know the deviation. So you could for example produce the 80% of the forecast. The worst case scenario is that customer could order or demand more. If you can not provide the asked product of piece, you have the bigger loss (in image) than if you have produced a bit too much, which you can hold and put in the sale in the end.
When is it ok to use the gaussian distribution in forecasting?
As long as the to be predicted number is far away from 0 (like hundreds of thousends of better millions)
How is a time series stored in R
In a ts object:
- A list of numbers
- Information about times those numbers were recorded
What is the command for the ts class package in R and what does it include?
library(fpp2)
Loads:
- some data for use in examples and exercises
- forecast package (for forecasting functions)
- ggplot2 package (for graphics functions)
- fma package (for lots of time series data)
- expsmooth package (for more time series data)
How do you plot saisons in a ts?
With seasonal plots:
- Data plotted against the individual "seasons" in which the data were observed. (in this case a "season" is a month.)
- Something like a time plot except that the data from each season are overlapped
- Enables the underlying sesonal pattern to be seen more clearly, and also allow any substantial departures form the seasonal pattern to be easily identified.
- In R: ggseasonplot()
What are the different time series patterns? Name and explain them!
- Trend
- Pattern exists when there is a long-term increase or decrease in the data
- Seasonal
- Pattern exists when a series in influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week)
- Cyclic
- Pattern exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years)
What are the differences between seasonal and cyclic patterns?
- seasonal pattern constant length; cyclic pattern variable length
- Average length of cycle longer than length of seasonal pattern
- magnitude of cycle more variable than magnitude of seasonal pattern
The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data.
What is covariance and correlation as well as autocoraviance and autocorrelation about?
- r4 higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart
- r2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks
- Together, the autocorrelations at lags 1, 2, ..., make up the autocorrelation or ACF
- The plot is known as a correlogram
What statements about trend and seasonality in ACF plots can be made?
- When data have a trend, the autocorrelations for small lags tend to be large and positive.
- When data sare seasonal , the autocorrelations willl be larger at the seasonal lags (e.g. at multiples of the seasonal frequency)
- When data are trended and seasonal, you see a combination of these effects
What indicate the blue lines in the ACF plots?
95% interval
How can you proof that a stock price is white noise (e.g. google)?
- The stock can be modelled by the radnom walk model yt+1 = yt + epsilont
- where epsilont = N(0, sigma2), epsilont is i.i.d.: hence epsilont is independen from epsilont-1, epsilont-2
- By differencing: yt+1 - yt = epsilont
- which is indeed a white noise time series
What means et?
Residuals in Forecasting: Difference between observed value and its fittted value.
What are the assumptions and properties of the residuals when forecasting is done well?
- Assumptions
- Residuals are uncorrelated. If they aren't, then information left in residuals that should be used in computing forecasts.
- Residuals have mean zero. If they don't, then forecasts are biased.
- Useful properties (for prediction intervals)
- Residuals have constant variance
- Residuals are normally distributed
Note: et are one-step-forecast residuals
What is the ACF about?
- We assume that the residuals are white noise (uncorrelated, mean zero, constant variance). If they aren't, then there is information left in the residuals that should be used in computing forecasts.
- So a standard residual diagnostic is to check the ACF of the residuals of a forecasting method. Where the data should lay between the blue 95% boundary.
- We expect these to look like white noise
What is the Ljung-Box test about?
- Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set.
- If each rk is close to zero, Q will be small
- If some rk values are large (positive or negative), Q will be large
- Note mla: i assume rk are residuals
What are the recommended defaults for h?
- h = 10 for non-seasonal data
- h = 2m for seasonal data, where m is the length of the season
How is the Portmanteau test (Box-Ljung) interpreted?
Note: Gets automatically done with the checkresiduals() function
- The test checks the null hypothesis that the data is white noise
- Small p-values lead to rejecting the null hypothesis: They are evidence of significant auto-correlation
- Large p-values lead insted to axxepting the null hypothesis
- Typical threshold decision:
- p-value > 0.05 -> accept he null hypothesis (white noise)
- p-value < 0.05 -> reject the null hypothesis, concluding that there is a significant autocorrelation
Why do you use a training and a test set?
- A model which firs the training data well will not necessarily forecast well
- A perfect fit can always be obtained by using a model with enough parameters
- Over-fitting a model to data is just as bad as failing to identify a systemtic pattern in the data
- The test set must not be used for any aspect of model decelopment or calculation of forecasts
- Forecast accuracy is based only on the test set
True or false?
What is cross-validation with time series about?
- Forecast accuracy averaged over test sets (time step for time step)
- Also known as "evaluation on a rolling forecasting origin"
- A good way to choose the best forecasting model is to find the model with the smallest RMSE computed using time series cross-validatoin
What are prediction intervals about?