Mungebits are atomic data transformations of a data.frame that, loosely speaking, aim to modify "one thing" about a variable or collection of variables. This is pretty loosely defined, but examples include dropping variables, mapping values, discretization, etc.
mungebit_initialize(train_function = base::identity, predict_function = train_function, enforce_train = TRUE, nse = FALSE)
train_function | function. This specifies the behavior to perform on the dataset when preparing for model training. A value of NULL specifies that there should be no training step, i.e., the data should remain untouched. |
---|---|
predict_function | function. This specifies the behavior to perform on the dataset when preparing for model prediction. A value of NULL specifies that there should be no prediction step, i.e., the data should remain untouched. |
enforce_train | logical. Whether or not to flip the trained flag during runtime. Set this to FALSE if you are experimenting with or debugging the mungebit. |
nse | logical. Whether or not we expect to use non-standard evaluation
with this mungebit. Non-standard evaluation allows us to obtain the
correct R expression when using |
mb <- mungebit$new(column_transformation(function(column, scale = NULL) { # `trained` is a helper provided by mungebits indicating TRUE or FALSE # according as the mungebit has been run on a dataset. if (!trained) { cat("Column scaled by ", input$scale, "\n") } else { # `input` is a helper provided by mungebits. We remember the # the `scale` so we can re-use it during prediction. input$scale <- scale } column * input$scale })) # We make a lightweight wrapper to keep track of our data so # the mungebit can perform side effects (i.e., modify the data without an # explicit assignment <- operator). irisp <- list2env(list(data = iris)) #mb$run(irisp, 'Sepal.Length', 2) #head(mp$data[[1]] / iris[[1]]) # > [1] 2 2 2 2 2 2 #mb$run(mp, 'Sepal.Length') # > Column scaled by 2 #head(mp$data[[1]] / iris[[1]]) # > [1] 4 4 4 4 4 4