This function generates and tests possible VAR models for the specified variables. The only required arguments are av_state
and vars
.
var_main(av_state, vars, lag_max = 2, significance = 0.05,
exogenous_max_iterations = 2, subset = 1,
log_level = av_state$log_level, small = FALSE, include_model = NULL,
exogenous_variables = NULL, use_sktest = TRUE,
restrictions.verify_validity_in_every_step = TRUE,
restrictions.extensive_search = TRUE, criterion = c("AIC", "BIC"),
use_varsoc = FALSE, use_pperron = TRUE, include_squared_trend = FALSE,
normalize_data = FALSE, include_lag_zero = FALSE,
split_up_outliers = TRUE, format_output_like_stata = FALSE,
exclude_almost = FALSE, simple_models = FALSE,
numcores = parallel::detectCores())
Arguments
av_state 
an object of class av_state 
vars 
the vector of variables on which to perform vector autoregression. These should be the names of existing columns in the data sets of av_state . 
lag_max 
limits the highest possible number of lags that will be used in a model. This number sets the maximum limit in the search for optimal lags. 
significance 
the maximum Pvalue for which results are seen as significant. This argument is used only in the residual tests. 
exogenous_max_iterations 
determines how many times we should try to exclude additional outliers for a variable. This argument should be a number between 1 and 3:
1  When residual tests fail, having exogenous_max_iterations = 1 will only try with removing 3.5x std. outliers for the residuals of variables using exogenous dummy variables.
2  When exogenous_max_iterations = 2 , the program will also try with removing 3x std. outliers if residual tests still fail.
3  When exogenous_max_iterations = 3 , the program will also try with removing 2.5x std. outliers (not only from the residuals but also from the squares of the residuals) if residual tests still fail.

subset 
specifies which data subset the VAR analysis should run on. The VAR analysis only runs on one data subset at a time. If not specified, the first subset is used (corresponding to av_state$data[[1]] ). 
log_level 
sets the minimum level of output that should be shown. It should be a number between 0 and 3. A lower level means more verbosity. 0 = debug, 1 = test detail, 2 = test outcomes, 3 = normal. The default is set to the value of av_state$log_level or if that doesn't exist, to 0 . If this argument was specified, the original value of av_state$log_level is be restored at the end of var_main . 
small 
corresponds to the small argument of Stata's var function, and defaults to FALSE . This argument affects the outcome of the Granger causality test. When small = TRUE , the Granger causality test uses the Fdistribution to gauge the statistic. When small = FALSE , the Granger causality test uses the Chisquared distribution to gauge the statistic. 
include_model 
can be used to forcibly include a model in the evaluation. Included models have to be lists, and can specify the parameters lag , exogenous_variables , and apply_log_transform . For example:
av_state < var_main(av_state,c('Activity_hours','Depression'),
log_level=3,
small=TRUE,
include_model=list(lag=3,
exogenous_variables=data.frame(variable="Depression",
iteration=1,stringsAsFactors=FALSE),
apply_log_transform=TRUE))
var_info(av_state$rejected_models[[1]]$varest)
The above example includes a model with lag=3 (so lags 1, 2, and 3 are included), the model is ran on the logtransformed variables, and includes an exogenous dummy variable that has a 1 where values of log(Depression) are more than 3.5xstd away from the mean (because iteration=1 , see the description of the exogenous_max_iterations parameter above for the meaning of the iterations) and 0 everywhere else. The included model is added at the start of the list, so it can be retrieved (assuming a valid lag was specified) with either av_state$accepted_models[[1]] if the model was valid or av_state$rejected_models[[1]] if it was invalid. In the above example, some info about the included model is printed (assuming it was invalid). 
exogenous_variables 
should be a vector of variable names that already exist in the given data set, that will be supplied to every VAR model as exogenous variables. 
use_sktest 
affects which test is used for Skewness and Kurtosis testing of the residuals. When use_sktest = TRUE (the default), STATA's sktest is used. When use_sktest = FALSE , STATA's varnorm (i.e., the JarqueBera test) is used. 
restrictions.verify_validity_in_every_step 
is an argument that affects how constraints are found for valid models. When this argument is TRUE (the default), all intermediate models in the iterative constraintfinding method have to be valid. This ensures that we always find a valid constrained model for every valid model. If this argument is FALSE , then only after setting all constraints do we check if the resulting model is valid. If this is not the case, we fail to find a constrained model. 
restrictions.extensive_search 
is an argument that affects how constraints are found for valid models. When this argument is TRUE (the default), when the term with the highest pvalue does not provide a model with a lower BIC score, we attempt to constrain the term with the second highest pvalue, and so on. When this argument is FALSE , we only check the term with the highest pvalue. If restricting this term does not give an improvement in BIC score, we stop restricting the model entirely. 
criterion 
is the information criterion used to sort the models. Valid options are 'AIC' (the default) or 'BIC' . 
use_varsoc 
determines whether VAR lag order selection criteria should be employed to restrict the search space for VAR models. When use_varsoc is FALSE , all lags from 1 to lag_max are searched. 
use_pperron 
determines whether the PhillipsPerron test should be used to determine whether trend variables should be included in the models. When use_pperron is FALSE , all models will be evaluated both with and without the trend variable. The trend variable is specified using the order_by function. 
include_squared_trend 
determines whether the square of the trend is included if the trend is included for a model. The trend variable is specified using the order_by function. 
normalize_data 
determines whether the endogenous variables should be normalized. 
include_lag_zero 
determines whether models at lag order 0 are should be considered. These are models at lag 1 with constrained lag1 parameters in all equations. 
split_up_outliers 
determines whether each outlier should have its own exogenous variable. Defaults to TRUE. This will make a difference only when there is a variable with multiple outliers. 
format_output_like_stata 
when TRUE , all constraints and exogenous variables are always shown (i.e., it will now show exogenous variables that were included but constrained in all equations), and the constraints are formatted like in Stata. 
exclude_almost 
when TRUE , only Granger causalities with pvalue <= 0.05 are included in the results. When FALSE , pvalues between 0.05 and 0.10 are also included in results as "almost Granger causalities" that have half the weight of actual Granger causalities in the Granger causality summary graph. 
simple_models 
when TRUE , four changes are made in the way Autovar works.
Sets autovar to search only for lag 1 and lag 2 models. Additionally, the lag 2 models are restricted in the sense that only the autoregressive lag 2 is used, i.e., the crosslagged parameters for lag 2 are constrained.
The normality assumption (sktest) no longer tests for kurtosis (only for skewness).
exogenous_max_iterations is set to 1, meaning we only search one iteration deep for masking outliers, and in this iteration, points that are 2.5xstd away in the residuals or in the squared residuals are masked as outliers.
Autovar no longer considers constrained versions of the valid models.

numcores 
is the number of cores to use in parallel for evaluation the model. When this variable is 1 , no parallel processing is used and all processing is done serially. This variable has to be an integer between 1 and 16. The default value is the detected number of cores on the system (using detectCores() ). If the log_level is less than 3, the value for numcores is forced to 1 because output doesn't show up otherwise. 
Value
This function returns the modified av_state
object. The lists of accepted and rejected models can be retrieved through av_state$accepted_models
and av_state$rejected_models
. To print these, use print_accepted_models(av_state)
and print_rejected_models(av_state)
.
Examples
# NOT RUN {
av_state < load_file("../data/input/Activity and depression pp5 Angela.dta",log_level=3)
av_state < group_by(av_state,'id')
av_state < order_by(av_state,'Day')
av_state < add_derived_column(av_state,'Activity_hours','Activity',
operation='MINUTES_TO_HOURS')
av_state < var_main(av_state,c('Activity_hours','Depression'),log_level=3)
# }