Changes to workflows

Instructions

  1. Download the HTML notebooks and supporting files with learndrake::save_notebooks().
  2. Open 3-changes/3-changes.Rmd in a fresh R session.
  3. Make sure your working directory is 3-changes/.
  4. Step through these exercises in order. Unless otherwise indicated, run the code chunks in your R console. Check the answers as you go along.
  5. If something breaks along the way, you restart your R session and get a fresh copy of the notebooks and supporting files (learndrake::save_notebooks()).

Source the scripts

Verify that you are in the right working directory.

basename(getwd()) # Should be "3-changes"
#> [1] "3-changes"

Set some configuration options.

source("../config/options.R")

There should be an R/ folder with 3 scripts.

list.files("R")
#> [1] "functions.R" "packages.R"  "plan.R"

Run them to get your R session ready for drake.

source("R/packages.R")
source("R/functions.R")
source("R/plan.R")

Plan

You now have a plan.

print(plan)
plan
#> # A tibble: 7 x 3
#>   target      command                                                     format
#>   <chr>       <expr>                                                      <chr> 
#> 1 churn_reci… prepare_recipe(churn_data)                                … <NA>  
#> 2 churn_data  split_data(file_in("../data/customer_churn.csv"))         … <NA>  
#> 3 run_relu    test_model(act1 = "relu", churn_data, churn_recipe)       … <NA>  
#> 4 run_sigmoid test_model(act1 = "sigmoid", churn_data, churn_recipe)    … <NA>  
#> 5 run_softmax test_model(act1 = "softmax", churn_data, churn_recipe)    … <NA>  
#> 6 best_run    rbind(run_relu, run_sigmoid, run_softmax) %>% top_n(1, acc… <NA>  
#> 7 best_model  train_best_model(best_run, churn_recipe)                  … keras

Did creating the plan actually run the workflow? Hint: cached().

cached()

First target

Let’s look at the row order of the plan.

print(plan)
plan
#> # A tibble: 7 x 3
#>   target      command                                                     format
#>   <chr>       <expr>                                                      <chr> 
#> 1 churn_reci… prepare_recipe(churn_data)                                … <NA>  
#> 2 churn_data  split_data(file_in("data/customer_churn.csv"))            … <NA>  
#> 3 run_relu    test_model(act1 = "relu", churn_data, churn_recipe)       … <NA>  
#> 4 run_sigmoid test_model(act1 = "sigmoid", churn_data, churn_recipe)    … <NA>  
#> 5 run_softmax test_model(act1 = "softmax", churn_data, churn_recipe)    … <NA>  
#> 6 best_run    bind_rows(run_relu, run_sigmoid, run_softmax) %>% filter(a… <NA>  
#> 7 best_model  train_best_model(best_run, churn_recipe)                  … keras

When you call make(plan) in your R console, which target runs first? Why?

make(plan)

Last target

When you called make(plan) in the previous exercise, which target ran last? Why?

loadd()

Load best_model from the hidden .drake/ cache and print it out. What do you see?

loadd(best_model) # See also readd().
print(best_model)

Repeated make()

Run make(plan) again. What happens?

make(plan)

Restart R

Restart your R session, load the supporting scripts with source(), and then run make(plan) again. Which targets rebuild, if any?

# Restart R. (Look under "Session" in RStudio's top menu bar.)
source("R/packages.R")
source("R/functions.R")
source("R/plan.R")
make(plan)

Data change

Delete the last line (row) inside ../data/customer_churn.csv and run make(plan) again. (Go back up a level in RStudio’s file viewer, then navigate to the data folder, and then edit the CSV file in RStudio.) Which targets rebuild? Why?

# Open ../data/customer_churn.csv in the RStudio IDE ("View File").
# Delete the last record.
# Save the file.
make(plan)

Command change

Open the R/plan.R script file in 3-plans/ and change the command of best_run: replace rbind() with the more preferable bind_rows(). Save R/plan.R to disk.

# R/plan.R
plan <- drake_plan(
  # Keep other targets the same...
  best_run = bind_rows(run_relu, run_sigmoid, run_softmax) %>% # use bind_rows()
    top_n(1, accuracy) %>%
    head(1),
  # Keep other targets the same...
)

Then, run source("R/plan.R") and then make(plan).

source("R/plan.R")
make(plan)

What happens?

Function change

Open R/functions.R for editing (in the 3-changes/ folder). In define_model(), change the dropout rate of the first dropout layer from 0.1 to 0.2. Save the file but do not run it or source() it into your R session.

define_model <- function(churn_recipe, units1, units2, act1, act2, act3) {
  # keep the same...
  layer_dropout(rate = 0.2) %>% # previously 0.1
  # keep the same...
}

Then, run make(plan).

# Edit and save R/functions.R,
# but do not source("R/functions.R").
make(plan)

What happens?

Reload functions

Now, run source("R/functions.R") before you run make(plan) again.

# Edit and save R/functions.R.
source("R/functions.R")
make(plan)

Which targets rebuild, if any?

Undo change

Undo the change you made to define_model() in the previous exercise. Then, save R/functions.R to disk. Both dropout rates should now be 0.1 again.

define_model <- function(churn_recipe, units1, units2, act1, act2, act3) {
  # keep the same...
  layer_dropout(rate = 0.1) %>% # changed back to 0.1
  # keep the same...
}

Source the functions and run make(plan, recover = TRUE).

# Edit and save R/functions.R,
source("R/functions.R")
make(plan, recover = TRUE)

What happens?

Trivial changes

Add a comment to define_model(). Also, add new line breaks around the argument list.

define_model <- function( # new line break
  churn_recipe, units1, units2, act1, act2, act3 # new line break
) {
  # Write a comment anywhere in the body.
  # Keep the rest the same...
}

Then, load the functions again and run make(plan).

# Edit and save R/functions.R,
source("R/functions.R")
make(plan)

What happens?

History

View the history of all your past targets.

history <- drake_history()
print(history)
View(history)

What do you see?

Historical target

Run the following.

drake_history() %>%
  filter(target == "run_relu") %>%
  filter(built == min(built)) %>%
  pull(hash) %>%
  drake_cache()$get_value(.)

What does it return?

And how drake responds