--- title: "05 Simulation Experiments" author: - "Eliot J. B. McIntire" date: "`r strftime(Sys.Date(), '%B %d %Y')`" output: rmarkdown::html_vignette: number_sections: yes self_contained: yes toc: yes vignette: > %\VignetteIndexEntry{05 Simulation Experiments} %\VignetteDepends{NLMR, SpaDES.core, SpaDES.tools} %\VignetteKeyword{experiment} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} hasSuggests <- all( require("NLMR", quietly = TRUE), require("SpaDES.tools", quietly = TRUE) ) knitr::opts_chunk$set(eval = hasSuggests) ``` ## Running a groups of simulations: the `experiment` function Once a simulation is properly initialized and runs correctly with the `spades` function, it is likely time to do things like run replicates. There are two functions, `experiment` and `experiment2` to do this. These both give opportunities to create simulation experiments, such as through varying parameters and modules. `experiment` is for simpler "fully factorial" cases, and `experiment2` can take any `simList` class objects for the simulation experiment. With `experiment`, for example, you can pass alternative values to parameters in modules and the `experiment` will build a fully factorial experiment, with replication and run them. This function (as with `POM` and `splitRaster`) is parallel-aware and can either proceed with a cluster object (`cl` argument) or a cluster running in the background using the `raster::beginCluster`. ```{r experiment, echo=TRUE} tmpdir <- file.path(tempdir(), "experiment") library("SpaDES.core") library("SpaDES.experiment") # Copy Example 5 here: mySim <- simInit( times = list(start = 0.0, end = 2.0, timeunit = "year"), params = list(.globals = list(stackName = "landscape", burnStats = "nPixelsBurned")), modules = list("randomLandscapes", "fireSpread", "caribouMovement"), paths = list(modulePath = system.file("sampleModules", package = "SpaDES.core"), outputPath = tmpdir), # Save final state of landscape and caribou outputs = data.frame(objectName = c("landscape", "caribou"), stringsAsFactors = FALSE) ) sims <- experiment(mySim, replicates = 2, .plotInitialTime = NA) # no plotting attr(sims, "experiment")$expDesign # shows 2 replicates of same experiment ``` ## Caching at the experiment level As with `spades`, an `experiment` can be cached. ## At the `experiment` level This functionality can be achieved within an experiment call. This can be done 2 ways, either: "internally" through the cache argument, which will cache *each spades call*; or, "externally" which will *cache the entire experiment*. If there are lots of `spades` calls, then the former will be slow as the `simList` will be digested once per `spades` call. ### Using cache argument ```{r experiment-cache} system.time({ sims1 <- experiment(mySim, replicates = 2, cache = TRUE, .plotInitialTime = NA) }) # internal -- second time faster system.time({ sims2 <- experiment(mySim, replicates = 2, cache = TRUE, .plotInitialTime = NA) }) all.equal(sims1, sims2) ``` ### Wrapping `experiment` with `Cache` Here, the `simList` (and other arguments to experiment) is hashed once, and if it is found to be the same as previous, then the returned list of `simList` objects is recovered. This means that even a very large experiment, with many replicates and combinations of parameters and modules can be recovered very quickly. Here we show that you can output objects to disk, so the list of `simList` objects doesn't get too big. Then, when we recover it in the Cached version, all the files are still there, the list of `simList` objects is small, so very fast to recover. ```{r Cache-experiment} # External outputs(mySim) <- data.frame(objectName = "landscape") system.time({ sims3 <- Cache(experiment, mySim, replicates = 3, .plotInitialTime = NA, clearSimEnv = TRUE) }) ``` The second time is way faster. We see the output files in the same location. ```{r Cache-experiment-2} system.time({ sims4 <- Cache(experiment, mySim, replicates = 3, .plotInitialTime = NA, clearSimEnv = TRUE) }) # test they are all equal lapply(1:2, function(x) all.equal(sims3[[x]], sims4[[x]])) dir(outputPath(mySim), recursive = TRUE) ``` Notice that speed up can be enormous; in this case ~100 times faster. ## Nested Caching This is a continuation of the `Nested Caching` section in the `iv-caching` vignetted in the `SpaDES.core` package. - Imagine we have large model, with many modules, with replication and alternative module collections (e.g., alternative fire models) - To run this would have a nested structure with the following functions: ```{r, eval=FALSE, echo=TRUE} simInit --> many .inputObjects calls experiment --> many spades calls --> many module calls --> many event calls --> many function calls ``` Lets say we start to introduce caching to this structure. We start from the "inner" most functions that we could imaging Caching would be useful. Lets say there are some GIS operations, like `raster::projectRaster`, which operates on an input shapefile. We can Cache the `projectRaster` call to make this much faster, since it will always be the same result for a given input raster. If we look back at our structure above, we see that we still have LOTS of places that are not Cached. That means that the experiment call will still spawn many spades calls, which will still spawn many module calls, and many event calls, just to get to the one `Cache(projectRaster)` call which is Cached. This function will likely be called hundreds of times (because `experiment` runs the `spades` call 100 times due to replication). This is good, but **`Cache` does take some time**. So, even if `Cache(projectRaster)` takes only 0.02 seconds, calling it hundreds of times means maybe 4 seconds. If we are doing this for many functions, then this will be too slow. We can start putting `Cache` all up the sequence of calls. Unfortunately, the way we use Cache at each of these levels is a bit different, so we need a slightly different approach for each. #### Cache the `experiment` call `Cache(experiment)` This will assess the `simList` (the objects, times, modules, etc.) and if they are all the same, it will return the final list of `simList`s that came from the first `experiment` call. **NOTE:** because this can be large, it is likely that you want `clearSimEnv = TRUE`, and have all objects that are needed after the experiment call saved to disk. Any stochasticity/randomness inside modules will be frozen. This is likely ok if the objective is to show results in a web app (via shiny or otherwise) or another visualization about the experiment outputs, e.g., comparing treatments, once sufficient stochasticity has been achieved. `mySimListOut <- Cache(experiment, mySim, clearSimEnv = TRUE)` #### Cache the `spades` calls inside `experiment` `experiment(cache = TRUE)` This will cache each of the `spades` calls inside the `experiment` call. That means that there are as many cache events as there are replicates and experimental treatments, which, again could be a lot. Like caching the `experiment` call, stochasticity/randomness will be frozen. Note, one good use of this is when you are making iterative, incremental replication, e.g., `mySimOut <- experiment(mySim, replicates = 5, cache = TRUE)` You decide after waiting 10 minutes for it to finish, that you need more replication. Rather than start from zero replicates, you can just pick up where you left off: `mySimOut <- experiment(mySim, replicates = 10, cache = TRUE)` This will only add 5 more replicates.