Constructing mungepieces and mungebits by hand is a little tedious.
To simplify the process, we introduce a tiny DSL that allows for
easier construction of mungebits. The intention is for this function
to be used in conjuction with a list passed to the munge
helper.
parse_mungepiece(args)
args | list. A list of arguments. This can be one of the following formats
Note that the above trichotomy is exhaustive: any mungepiece can be
constructed using this helper, regardless of its mungebit's
train or predict function or its own train or predict arguments.
In the first two formats, the first unnamed list element is always
reserved and will never belong to the Also note that in the first two formats, the first list element must be unnamed. |
---|
The constructed mungepiece
.
To understand the documentation of this helper, please read
the documentation on mungebit
and mungepiece
first.
# First, we show off the various formats that the parse_mungepiece # helper accepts. For this exercise, we can use dummy train and # predict functions and arguments. train_fn <- predict_fn <- function(x, ...) { x } train_arg1 <- predict_arg1 <- dual_arg1 <- TRUE # Can be any parameter value. # If the train function with train args is the same as the predict function # with predict args. piece <- parse_mungepiece(list(train_fn, train_arg1, train_arg2 = "blah")) # If the train and predict arguments to the mungepiece match, but we # wish to use a different train versus predict function for the mungebit. piece <- parse_mungepiece(list(list(train_fn, predict_fn), dual_arg1, dual_arg2 = "blah")) # If we wish to only run this mungepiece during training. piece <- parse_mungepiece(list(list(train_fn, NULL), train_arg1, train_arg2 = "blah")) # If we wish to only run this mungepiece during prediction piece <- parse_mungepiece(list(list(NULL, predict_fn), predict_arg1, predict_arg2 = "blah")) # If we wish to run different arguments but the same function during # training versus prediction. piece <- parse_mungepiece(list(train = list(train_fn, train_arg1), predict = list(train_fn, predict_arg1))) # If we wish to run different arguments with different functions during # training versus prediction. piece <- parse_mungepiece(list(train = list(train_fn, train_arg1), predict = list(predict_fn, predict_arg1))) # The munge function uses the format defined in parse_mungepiece to create # and execute a list of mungepieces on a dataset. not_run({ munged_data <- munge(raw_data, list( "Drop useless vars" = list(list(drop_vars, vector_of_variables), list(drop_vars, c(vector_variables, "dep_var"))), "Impute variables" = list(imputer, imputed_vars), "Discretize vars" = list(list(discretize, restore_levels), discretized_vars) )) # Here, we have requested to munge the raw_data by dropping useless variables, # including the dependent variable dep_var after model training, # imputing a static list of imputed_vars, discretizing a static list # of discretized_vars being careful to use separate logic when merely # using the computed discretization cuts to bin the numeric features into # categorical features. The end result is a munged_data set with an # attribute "mungepieces" that holds the list of mungepieces used for # munging the data, and can be used to perform the exact same set of # operations on a single row dataset coming through in a real-time production # system. munged_single_row_of_data <- munge(single_row_raw_data, munged_data) }) # The munge function uses the attached "mungepieces" attribute, a list of # trained mungepieces.