This function can impute data for values that are missing (i.e., for values that are NA). It outputs for every subset how many values were imputed (for all columns total) along with a percentage (following <=). This percentage is the percentage of the column with the highest percentage of imputed values, i.e., if multiple columns were specified, it is the percentage of values that were imputed of the column that had relatively most NA values.

impute_missing_values(av_state, columns, subset_ids = "ALL",
  type = c("SIMPLE", "EM"))

Arguments

av_state

an object of class av_state

columns

the columns of which missing values should be imputed. This argument can be a single column or a vector of column names.

subset_ids

identifies which data subsets to impute data for. This argument can be a single subset, a range of subsets (both of which are identified by their indices), or it can be the word 'ALL' (default). In the latter case, the selected columns of all data subsets are processed.

type

this argument has two possible values:

  • 'SIMPLE' - The value of the missing data is determined by up to five values surrounding the value (2 before, 3 after, unless at the start or end of the range). For numeric (scl) columns, the mean of these values is chosen as value. For factor (nom) columns, the mode of these values is chosen as value.

  • 'EM' - Em imputation. Currently not implemented.

Value

This function returns the modified av_state object.

Examples

# NOT RUN {
av_state <- load_file("../data/input/RuwedataAngela.sav",log_level=3)
av_state <- group_by(av_state,'id')
print(av_state)
av_statea <- impute_missing_values(av_state,'norm_bewegen')
print(av_statea)
av_stateb <- impute_missing_values(av_state,c('norm_bewegen',
                                   'minuten_woonwerk'),subset_ids=1)
print(av_stateb)
# }