impute_missing_values.Rd
This function can impute data for values that are missing (i.e., for values that are NA
). It outputs for every subset how many values were imputed (for all columns total) along with a percentage (following <=
). This percentage is the percentage of the column with the highest percentage of imputed values, i.e., if multiple columns were specified, it is the percentage of values that were imputed of the column that had relatively most NA
values.
impute_missing_values(av_state, columns, subset_ids = "ALL", type = c("SIMPLE", "EM"))
av_state | an object of class |
---|---|
columns | the columns of which missing values should be imputed. This argument can be a single column or a vector of column names. |
subset_ids | identifies which data subsets to impute data for. This argument can be a single subset, a range of subsets (both of which are identified by their indices), or it can be the word |
type | this argument has two possible values:
|
This function returns the modified av_state
object.
# NOT RUN { av_state <- load_file("../data/input/RuwedataAngela.sav",log_level=3) av_state <- group_by(av_state,'id') print(av_state) av_statea <- impute_missing_values(av_state,'norm_bewegen') print(av_statea) av_stateb <- impute_missing_values(av_state,c('norm_bewegen', 'minuten_woonwerk'),subset_ids=1) print(av_stateb) # }