Mungebits are atomic data transformations of a data.frame that, loosely speaking, aim to modify "one thing" about a variable or collection of variables. This is pretty loosely defined, but examples include dropping variables, mapping values, discretization, etc.

mungebit_initialize(train_function = base::identity,
  predict_function = train_function, enforce_train = TRUE, nse = FALSE)

Arguments

train_function

function. This specifies the behavior to perform on the dataset when preparing for model training. A value of NULL specifies that there should be no training step, i.e., the data should remain untouched.

predict_function

function. This specifies the behavior to perform on the dataset when preparing for model prediction. A value of NULL specifies that there should be no prediction step, i.e., the data should remain untouched.

enforce_train

logical. Whether or not to flip the trained flag during runtime. Set this to FALSE if you are experimenting with or debugging the mungebit.

nse

logical. Whether or not we expect to use non-standard evaluation with this mungebit. Non-standard evaluation allows us to obtain the correct R expression when using substitute from within the body of a train or predict function for the mungebit. By default, FALSE, meaning non-standard evaluation will not be available to the train and predict functions, but this ability can be switched on at a slight speed detriment (2-3x prediction slowdown for the fastest functions, somewhat negligible for slower functions).

Examples

mb <- mungebit$new(column_transformation(function(column, scale = NULL) { # `trained` is a helper provided by mungebits indicating TRUE or FALSE # according as the mungebit has been run on a dataset. if (!trained) { cat("Column scaled by ", input$scale, "\n") } else { # `input` is a helper provided by mungebits. We remember the # the `scale` so we can re-use it during prediction. input$scale <- scale } column * input$scale })) # We make a lightweight wrapper to keep track of our data so # the mungebit can perform side effects (i.e., modify the data without an # explicit assignment <- operator). irisp <- list2env(list(data = iris)) #mb$run(irisp, 'Sepal.Length', 2) #head(mp$data[[1]] / iris[[1]]) # > [1] 2 2 2 2 2 2 #mb$run(mp, 'Sepal.Length') # > Column scaled by 2 #head(mp$data[[1]] / iris[[1]]) # > [1] 4 4 4 4 4 4