Determine possibly optimal models for Vector Autoregression

This function generates and tests possible VAR models for the specified variables. The only required arguments are av_state and vars.

var_main(av_state, vars, lag_max = 2, significance = 0.05,
  exogenous_max_iterations = 2, subset = 1,
  log_level = av_state$log_level, small = FALSE, include_model = NULL,
  exogenous_variables = NULL, use_sktest = TRUE,
  restrictions.verify_validity_in_every_step = TRUE,
  restrictions.extensive_search = TRUE, criterion = c("AIC", "BIC"),
  use_varsoc = FALSE, use_pperron = TRUE, include_squared_trend = FALSE,
  normalize_data = FALSE, include_lag_zero = FALSE,
  split_up_outliers = TRUE, format_output_like_stata = FALSE,
  exclude_almost = FALSE, simple_models = FALSE,
  numcores = parallel::detectCores())

Arguments

av_state	an object of class `av_state`
vars	the vector of variables on which to perform vector autoregression. These should be the names of existing columns in the data sets of `av_state`.
lag_max	limits the highest possible number of lags that will be used in a model. This number sets the maximum limit in the search for optimal lags.
significance	the maximum P-value for which results are seen as significant. This argument is used only in the residual tests.
exogenous_max_iterations	determines how many times we should try to exclude additional outliers for a variable. This argument should be a number between 1 and 3: `1` - When residual tests fail, having `exogenous_max_iterations = 1` will only try with removing 3.5x std. outliers for the residuals of variables using exogenous dummy variables. `2` - When `exogenous_max_iterations = 2`, the program will also try with removing 3x std. outliers if residual tests still fail. `3` - When `exogenous_max_iterations = 3`, the program will also try with removing 2.5x std. outliers (not only from the residuals but also from the squares of the residuals) if residual tests still fail.
subset	specifies which data subset the VAR analysis should run on. The VAR analysis only runs on one data subset at a time. If not specified, the first subset is used (corresponding to `av_state$data[[1]]`).
log_level	sets the minimum level of output that should be shown. It should be a number between 0 and 3. A lower level means more verbosity. `0` = debug, `1` = test detail, `2` = test outcomes, `3` = normal. The default is set to the value of `av_state$log_level` or if that doesn't exist, to `0`. If this argument was specified, the original value of `av_state$log_level` is be restored at the end of `var_main`.
small	corresponds to the `small` argument of Stata's `var` function, and defaults to `FALSE`. This argument affects the outcome of the Granger causality test. When `small = TRUE`, the Granger causality test uses the F-distribution to gauge the statistic. When `small = FALSE`, the Granger causality test uses the Chi-squared distribution to gauge the statistic.
include_model	can be used to forcibly include a model in the evaluation. Included models have to be lists, and can specify the parameters `lag`, `exogenous_variables`, and `apply_log_transform`. For example: `av_state <- var_main(av_state,c('Activity_hours','Depression'), log_level=3, small=TRUE, include_model=list(lag=3, exogenous_variables=data.frame(variable="Depression", iteration=1,stringsAsFactors=FALSE), apply_log_transform=TRUE)) var_info(av_state$rejected_models[[1]]$varest)` The above example includes a model with `lag=3` (so lags 1, 2, and 3 are included), the model is ran on the log-transformed variables, and includes an exogenous dummy variable that has a 1 where values of `log(Depression)` are more than 3.5xstd away from the mean (because `iteration=1`, see the description of the `exogenous_max_iterations` parameter above for the meaning of the iterations) and 0 everywhere else. The included model is added at the start of the list, so it can be retrieved (assuming a valid `lag` was specified) with either `av_state$accepted_models[[1]]` if the model was valid or `av_state$rejected_models[[1]]` if it was invalid. In the above example, some info about the included model is printed (assuming it was invalid).
exogenous_variables	should be a vector of variable names that already exist in the given data set, that will be supplied to every VAR model as exogenous variables.
use_sktest	affects which test is used for Skewness and Kurtosis testing of the residuals. When `use_sktest = TRUE` (the default), STATA's `sktest` is used. When `use_sktest = FALSE`, STATA's `varnorm` (i.e., the Jarque-Bera test) is used.
restrictions.verify_validity_in_every_step	is an argument that affects how constraints are found for valid models. When this argument is `TRUE` (the default), all intermediate models in the iterative constraint-finding method have to be valid. This ensures that we always find a valid constrained model for every valid model. If this argument is `FALSE`, then only after setting all constraints do we check if the resulting model is valid. If this is not the case, we fail to find a constrained model.
restrictions.extensive_search	is an argument that affects how constraints are found for valid models. When this argument is `TRUE` (the default), when the term with the highest p-value does not provide a model with a lower BIC score, we attempt to constrain the term with the second highest p-value, and so on. When this argument is `FALSE`, we only check the term with the highest p-value. If restricting this term does not give an improvement in BIC score, we stop restricting the model entirely.
criterion	is the information criterion used to sort the models. Valid options are `'AIC'` (the default) or `'BIC'`.
use_varsoc	determines whether VAR lag order selection criteria should be employed to restrict the search space for VAR models. When `use_varsoc` is `FALSE`, all lags from 1 to `lag_max` are searched.
use_pperron	determines whether the Phillips-Perron test should be used to determine whether trend variables should be included in the models. When `use_pperron` is `FALSE`, all models will be evaluated both with and without the trend variable. The trend variable is specified using the `order_by` function.
include_squared_trend	determines whether the square of the trend is included if the trend is included for a model. The trend variable is specified using the `order_by` function.
normalize_data	determines whether the endogenous variables should be normalized.
include_lag_zero	determines whether models at lag order 0 are should be considered. These are models at lag 1 with constrained lag-1 parameters in all equations.
split_up_outliers	determines whether each outlier should have its own exogenous variable. Defaults to TRUE. This will make a difference only when there is a variable with multiple outliers.
format_output_like_stata	when `TRUE`, all constraints and exogenous variables are always shown (i.e., it will now show exogenous variables that were included but constrained in all equations), and the constraints are formatted like in Stata.
exclude_almost	when `TRUE`, only Granger causalities with p-value <= 0.05 are included in the results. When `FALSE`, p-values between 0.05 and 0.10 are also included in results as "almost Granger causalities" that have half the weight of actual Granger causalities in the Granger causality summary graph.
simple_models	when `TRUE`, four changes are made in the way Autovar works. Sets autovar to search only for lag 1 and lag 2 models. Additionally, the lag 2 models are restricted in the sense that only the autoregressive lag 2 is used, i.e., the cross-lagged parameters for lag 2 are constrained. The normality assumption (sktest) no longer tests for kurtosis (only for skewness). `exogenous_max_iterations` is set to 1, meaning we only search one iteration deep for masking outliers, and in this iteration, points that are 2.5xstd away in the residuals or in the squared residuals are masked as outliers. Autovar no longer considers constrained versions of the valid models.
numcores	is the number of cores to use in parallel for evaluation the model. When this variable is `1`, no parallel processing is used and all processing is done serially. This variable has to be an integer between 1 and 16. The default value is the detected number of cores on the system (using `detectCores()`). If the `log_level` is less than 3, the value for `numcores` is forced to 1 because output doesn't show up otherwise.

Value

This function returns the modified av_state object. The lists of accepted and rejected models can be retrieved through av_state$accepted_models and av_state$rejected_models. To print these, use print_accepted_models(av_state) and print_rejected_models(av_state).

Examples

# NOT RUN {
av_state <- load_file("../data/input/Activity and depression pp5 Angela.dta",log_level=3)
av_state <- group_by(av_state,'id')
av_state <- order_by(av_state,'Day')
av_state <- add_derived_column(av_state,'Activity_hours','Activity',
                               operation='MINUTES_TO_HOURS')
av_state <- var_main(av_state,c('Activity_hours','Depression'),log_level=3)
# }

Determine possibly optimal models for Vector Autoregression

Arguments

Value

Examples

Contents