This function generates and tests possible VAR models for the specified variables. The only required arguments are av_state and vars.

var_main(av_state, vars, lag_max = 2, significance = 0.05,
  exogenous_max_iterations = 2, subset = 1,
  log_level = av_state$log_level, small = FALSE, include_model = NULL,
  exogenous_variables = NULL, use_sktest = TRUE,
  restrictions.verify_validity_in_every_step = TRUE,
  restrictions.extensive_search = TRUE, criterion = c("AIC", "BIC"),
  use_varsoc = FALSE, use_pperron = TRUE, include_squared_trend = FALSE,
  normalize_data = FALSE, include_lag_zero = FALSE,
  split_up_outliers = TRUE, format_output_like_stata = FALSE,
  exclude_almost = FALSE, simple_models = FALSE,
  numcores = parallel::detectCores())

Arguments

av_state

an object of class av_state

vars

the vector of variables on which to perform vector autoregression. These should be the names of existing columns in the data sets of av_state.

lag_max

limits the highest possible number of lags that will be used in a model. This number sets the maximum limit in the search for optimal lags.

significance

the maximum P-value for which results are seen as significant. This argument is used only in the residual tests.

exogenous_max_iterations

determines how many times we should try to exclude additional outliers for a variable. This argument should be a number between 1 and 3:

  • 1 - When residual tests fail, having exogenous_max_iterations = 1 will only try with removing 3.5x std. outliers for the residuals of variables using exogenous dummy variables.

  • 2 - When exogenous_max_iterations = 2, the program will also try with removing 3x std. outliers if residual tests still fail.

  • 3 - When exogenous_max_iterations = 3, the program will also try with removing 2.5x std. outliers (not only from the residuals but also from the squares of the residuals) if residual tests still fail.

subset

specifies which data subset the VAR analysis should run on. The VAR analysis only runs on one data subset at a time. If not specified, the first subset is used (corresponding to av_state$data[[1]]).

log_level

sets the minimum level of output that should be shown. It should be a number between 0 and 3. A lower level means more verbosity. 0 = debug, 1 = test detail, 2 = test outcomes, 3 = normal. The default is set to the value of av_state$log_level or if that doesn't exist, to 0. If this argument was specified, the original value of av_state$log_level is be restored at the end of var_main.

small

corresponds to the small argument of Stata's var function, and defaults to FALSE. This argument affects the outcome of the Granger causality test. When small = TRUE, the Granger causality test uses the F-distribution to gauge the statistic. When small = FALSE, the Granger causality test uses the Chi-squared distribution to gauge the statistic.

include_model

can be used to forcibly include a model in the evaluation. Included models have to be lists, and can specify the parameters lag, exogenous_variables, and apply_log_transform. For example: av_state <- var_main(av_state,c('Activity_hours','Depression'), log_level=3, small=TRUE, include_model=list(lag=3, exogenous_variables=data.frame(variable="Depression", iteration=1,stringsAsFactors=FALSE), apply_log_transform=TRUE)) var_info(av_state$rejected_models[[1]]$varest) The above example includes a model with lag=3 (so lags 1, 2, and 3 are included), the model is ran on the log-transformed variables, and includes an exogenous dummy variable that has a 1 where values of log(Depression) are more than 3.5xstd away from the mean (because iteration=1, see the description of the exogenous_max_iterations parameter above for the meaning of the iterations) and 0 everywhere else. The included model is added at the start of the list, so it can be retrieved (assuming a valid lag was specified) with either av_state$accepted_models[[1]] if the model was valid or av_state$rejected_models[[1]] if it was invalid. In the above example, some info about the included model is printed (assuming it was invalid).

exogenous_variables

should be a vector of variable names that already exist in the given data set, that will be supplied to every VAR model as exogenous variables.

use_sktest

affects which test is used for Skewness and Kurtosis testing of the residuals. When use_sktest = TRUE (the default), STATA's sktest is used. When use_sktest = FALSE, STATA's varnorm (i.e., the Jarque-Bera test) is used.

restrictions.verify_validity_in_every_step

is an argument that affects how constraints are found for valid models. When this argument is TRUE (the default), all intermediate models in the iterative constraint-finding method have to be valid. This ensures that we always find a valid constrained model for every valid model. If this argument is FALSE, then only after setting all constraints do we check if the resulting model is valid. If this is not the case, we fail to find a constrained model.

restrictions.extensive_search

is an argument that affects how constraints are found for valid models. When this argument is TRUE (the default), when the term with the highest p-value does not provide a model with a lower BIC score, we attempt to constrain the term with the second highest p-value, and so on. When this argument is FALSE, we only check the term with the highest p-value. If restricting this term does not give an improvement in BIC score, we stop restricting the model entirely.

criterion

is the information criterion used to sort the models. Valid options are 'AIC' (the default) or 'BIC'.

use_varsoc

determines whether VAR lag order selection criteria should be employed to restrict the search space for VAR models. When use_varsoc is FALSE, all lags from 1 to lag_max are searched.

use_pperron

determines whether the Phillips-Perron test should be used to determine whether trend variables should be included in the models. When use_pperron is FALSE, all models will be evaluated both with and without the trend variable. The trend variable is specified using the order_by function.

include_squared_trend

determines whether the square of the trend is included if the trend is included for a model. The trend variable is specified using the order_by function.

normalize_data

determines whether the endogenous variables should be normalized.

include_lag_zero

determines whether models at lag order 0 are should be considered. These are models at lag 1 with constrained lag-1 parameters in all equations.

split_up_outliers

determines whether each outlier should have its own exogenous variable. Defaults to TRUE. This will make a difference only when there is a variable with multiple outliers.

format_output_like_stata

when TRUE, all constraints and exogenous variables are always shown (i.e., it will now show exogenous variables that were included but constrained in all equations), and the constraints are formatted like in Stata.

exclude_almost

when TRUE, only Granger causalities with p-value <= 0.05 are included in the results. When FALSE, p-values between 0.05 and 0.10 are also included in results as "almost Granger causalities" that have half the weight of actual Granger causalities in the Granger causality summary graph.

simple_models

when TRUE, four changes are made in the way Autovar works.

  1. Sets autovar to search only for lag 1 and lag 2 models. Additionally, the lag 2 models are restricted in the sense that only the autoregressive lag 2 is used, i.e., the cross-lagged parameters for lag 2 are constrained.

  2. The normality assumption (sktest) no longer tests for kurtosis (only for skewness).

  3. exogenous_max_iterations is set to 1, meaning we only search one iteration deep for masking outliers, and in this iteration, points that are 2.5xstd away in the residuals or in the squared residuals are masked as outliers.

  4. Autovar no longer considers constrained versions of the valid models.

numcores

is the number of cores to use in parallel for evaluation the model. When this variable is 1, no parallel processing is used and all processing is done serially. This variable has to be an integer between 1 and 16. The default value is the detected number of cores on the system (using detectCores()). If the log_level is less than 3, the value for numcores is forced to 1 because output doesn't show up otherwise.

Value

This function returns the modified av_state object. The lists of accepted and rejected models can be retrieved through av_state$accepted_models and av_state$rejected_models. To print these, use print_accepted_models(av_state) and print_rejected_models(av_state).

Examples

# NOT RUN {
av_state <- load_file("../data/input/Activity and depression pp5 Angela.dta",log_level=3)
av_state <- group_by(av_state,'id')
av_state <- order_by(av_state,'Day')
av_state <- add_derived_column(av_state,'Activity_hours','Activity',
                               operation='MINUTES_TO_HOURS')
av_state <- var_main(av_state,c('Activity_hours','Depression'),log_level=3)
# }