| Title: | Project Templates Using 'SpaDES' |
|---|---|
| Description: | Quickly setup a 'SpaDES' project directories and add modules using templates. |
| Authors: | Eliot J B McIntire [aut, cre] (ORCID: <https://orcid.org/0000-0002-6914-8316>), Alex M Chubaty [ctb] (ORCID: <https://orcid.org/0000-0001-7146-8135>), Ian Eddy [ctb] (ORCID: <https://orcid.org/0000-0001-7397-2116>), Ceres Barros [ctb] |
| Maintainer: | Eliot J B McIntire <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.1.9342 |
| Built: | 2026-06-04 03:03:33 UTC |
| Source: | https://github.com/PredictiveEcology/SpaDES.project |
SpaDES
Quickly setup 'SpaDES' project directories and add modules using templates.
Maintainer: Eliot J B McIntire [email protected] (ORCID)
Other contributors:
Alex M Chubaty [email protected] (ORCID) [contributor]
Ian Eddy [email protected] (ORCID) [contributor]
Ceres Barros [email protected] [contributor]
Useful links:
Report bugs at https://github.com/PredictiveEcology/SpaDES.project/issues
For a given name, this will return the default library for packages.
.libPathDefault(name).libPathDefault(name)
name |
A text string. When used in |
A path where the packages will be installed.
Render a scenario (or list of them) as the canonical output path.
as_path(x, pre = "outputs", withFieldLabel = .scenario_env$withFieldLabel)as_path(x, pre = "outputs", withFieldLabel = .scenario_env$withFieldLabel)
x |
A scenario, or anything coercible via |
pre |
Path prefix (default |
withFieldLabel |
Character vector of field names whose value
should be prefixed with the field name in the path
( |
Method-specific arguments (mapping, name_col, fields,
withFieldLabel) are forwarded via ... to the dispatched method;
see the method definitions in R/scenario.R.
as_scenario(x, ...)as_scenario(x, ...)
x |
A scenario, a character path/tarname, a named list, or a data.frame / tibble / dribble. |
... |
Method-specific arguments (see Details). |
A scenario (single input) or list of scenarios.
Render a scenario as an upload tar filename.
as_tarname( x, ext = ".tar.gz", pre = "outputs", withFieldLabel = .scenario_env$withFieldLabel )as_tarname( x, ext = ".tar.gz", pre = "outputs", withFieldLabel = .scenario_env$withFieldLabel )
x |
A scenario, or anything coercible via |
ext |
File extension (default |
pre |
Path prefix (default |
withFieldLabel |
Character vector of field names whose value
should be prefixed with the field name in the path
( |
simLists object to a data.table
This is particularly useful to build plots using the tidyverse, e.g., ggplot2.
Ported here from the now-unmaintained SpaDES.experiment package.
## S3 method for class 'simLists' as.data.table( x, keep.rownames = FALSE, ..., vals, objectsFromSim = NULL, objectsFromOutputs = NULL )## S3 method for class 'simLists' as.data.table( x, keep.rownames = FALSE, ..., vals, objectsFromSim = NULL, objectsFromOutputs = NULL )
x |
An R object. |
keep.rownames |
Default is |
... |
Additional arguments. Currently unused. |
vals |
A (named) list of object names to extract from each
|
objectsFromSim |
Character vector of objects to extract from the simLists. If
omitted, it will extract all objects from each simList in order to calculate the
|
objectsFromOutputs |
List of (named) character vectors of objects to load from the
|
See examples.
This returns a data.table class object.
Assess simulation status from PNG outputs
assessDoneInFigure( runName, timeout_min = 20, statusCalculate = getOption("spades.statusCalculate") )assessDoneInFigure( runName, timeout_min = 20, statusCalculate = getOption("spades.statusCalculate") )
runName |
Directory containing the figures/hists |
timeout_min |
Threshold for inactivity (e.g., 20) |
statusCalculate |
A quoted expression to compute job status from output files.
Defaults to |
Blocks the calling R session until every worker has completed. Optionally prints a summary of final queue statuses.
awaitExperimentFuture(ef, verbose = TRUE)awaitExperimentFuture(ef, verbose = TRUE)
ef |
An |
verbose |
If |
The ef object, invisibly.
Polls squeue -j <ids> every interval_s seconds until every
job ID has left the queue. Optionally prints a final queue-status summary.
awaitExperimentSBATCH(es, interval_s = 30, verbose = TRUE)awaitExperimentSBATCH(es, interval_s = 30, verbose = TRUE)
es |
An |
interval_s |
Polling interval in seconds. Default |
verbose |
If |
The es object, invisibly.
SpaDES.core::spades()
A wrapper around experiment2() that builds a fully-factorial set of
simLists from a single base simList plus alternative params / modules
/ inputs / objects, then runs them via experiment2()'s future
backend. The factorial design is built with factorialDesign().
experiment( sim, replicates = 1, params, modules, objects = list(), inputs, dirPrefix = "simNum", substrLength = 3, saveExperiment = TRUE, experimentFile = "experiment.RData", clearSimEnv = FALSE, notOlderThan, cl, ... )experiment( sim, replicates = 1, params, modules, objects = list(), inputs, dirPrefix = "simNum", substrLength = 3, saveExperiment = TRUE, experimentFile = "experiment.RData", clearSimEnv = FALSE, notOlderThan, cl, ... )
sim |
A |
replicates |
The number of replicates to run of the same |
params |
Like for |
modules |
Like for |
objects |
Like for |
inputs |
Like for |
dirPrefix |
String vector. This will be concatenated as a prefix on the directory names. |
substrLength |
Numeric. While making |
saveExperiment |
Logical. Should the resulting experimental design be saved to a file. Default TRUE. |
experimentFile |
String. Filename if |
clearSimEnv |
Logical. If TRUE, then the |
notOlderThan |
Currently unused (kept for back-compatibility). |
cl |
Deprecated and ignored; control parallelism with |
... |
Passed to |
This function (and the simLists class it produces) was moved here from the
now-unmaintained SpaDES.experiment package. Two behavioural notes versus the
historical version: parallelism is now controlled by future::plan() rather
than a cl cluster object (the cl argument is accepted but ignored, with a
message), and the return value is a simLists object (as from
experiment2()) rather than a plain list. The experimental design table is
still saved to experimentFile and is attached to the result (see Value).
Invisibly, a simLists object. The experimental design list
(expDesign + expVals) is attached as an attribute named "experiment"
on the object's data environment, i.e. attr([email protected], "experiment"),
and is also written to experimentFile.
Eliot McIntire
experiment2(), factorialDesign(), as.data.table.simLists(),
experiment_family
A SpaDES "experiment" is a way of running a simulation many times with
varying inputs, parameters, paths, scenarios, or replicates. This lets
you run, for example, replication of stochastic models, hypothesis testing
with different data inputs, scenario analysis of different human decisions,
building large datasets of alternative mechanisms to enable ensemble
modeling, and other possibilities.
There are five functions to choose from; these can be classified into
two groups. The first group
(experiment() / experiment2()) is conceptually simpler: it works on
in-memory simList objects directly. The second group
(experimentTmux() / experimentFuture() / experimentSBATCH()) is built
around a project global.R script (typically where setupProject() is run)
and a shared job queue. This second group becomes more useful as the number
of runs (e.g., scenarios, replicates) become numerous, long, spread across machines,
or being run from a High Performance Compute cluster.
simListsThese take simList object(s) directly and are analogous to running
SpaDES.core::spades(). They return results as a
simLists ("plural") object you can post-process e.g., with
as.data.table.simLists() or any other custom methods.
These functions are best when the run set is modest, fits in RAM, and
you want the result objects back in your session. They are not built for
resume-after-crash, cross-machine pulls, or HPC. (Moved here from the
now-unmaintained SpaDES.experiment package.)
experiment2()The core in-memory runner: give it one or more
simLists (and optionally replicates) and it runs them all and returns
a simLists. You build the variation yourself, e.g. with
several SpaDES.core::simInit() calls.
experiment()A light wrapper around experiment2() that builds
the variation for you: give it one base simList plus alternative
params / modules / inputs / objects and it constructs the
fully-factorial set of simLists (via factorialDesign()) and runs
them. factorialDesign() is exported separately, so the same design can
also seed the df of the second group below.
global.R)Here, the user describes the experiment using a data.frame (or data.table)
in which each column name and row value defines the set of object-value pairs
that will be assigned to variables in the .GlobalEnv.
When the user runs one of these functions, the data.frame is translated into a
queue data.frame that has all the same columns and rows, plus a few more
(status, claimed_by, etc.) to coordinate the run. After creating the
queue, the function spawns a number of independent R "worker" sessions
(according to n_workers or cores). Each worker selects a single row,
assigns the values in each user-specified column to an object in the
.GlobalEnv whose name is the column name, then source()s global.R. For
example, if the data.frame has 2 rows and a column named runName with
values "trial1" and "trial2", the first worker runs
runName <- "trial1"; source(global_path) and the second runs
runName <- "trial2"; source(global_path). The status column starts as
"PENDING" for all rows; workers take the next "PENDING" (or "INTERRUPTED")
row, skipping "DONE" rows, and mark a row "DONE" when it finishes without
error before moving to the next.
These three share the queue and the run-naming convention and differ only in how parallel workers are spawned:
experimentTmux()Allows the most interactivity and so
is helpful when there is still debugging to perform. This will
only work on a computer that has tmux installed. The function
spawns one tmux pane per worker, optionally across
ssh-reachable machines. Best for interactive use where you want to
watch workers live (tmux attach). Workers can be stopped with
tmuxKillPanes().
experimentFuture()When there is little to no debugging
necessary, this function will use background R processes using
either callr::r_bg() if all workers are local,
or future::cluster if some of the workers are on different machines.
Best for stable scripts. Workers can be stopped with killExperimentFuture().
experimentSBATCH()One Slurm batch job per worker. Best for
HPC clusters with sbatch / squeue / scancel. Block with
awaitExperimentSBATCH() (polls squeue) or stop with
killExperimentSBATCH() (graceful via stop files; force = TRUE
issues scancel). Inspect generated job scripts with
dry_run = TRUE.
All three of these accept the same core arguments:
dfThe parameter grid; one row = one job. Column names become
R variables in the worker's .GlobalEnv before global.R is sourced.
global_pathPath to the R script each worker sources per job.
Must be on a filesystem visible to all workers (matters for
experimentSBATCH() and remote-host modes of the other two).
queue_pathPath to the local RDS queue file. Workers coordinate through file-based locks on this file; remove it (or point a fresh path) to start over, leave it to resume.
runNameLabelQuoted expression evaluated against each row to
derive a human-readable identifier (used in log messages, sentinel
filenames, and tmuxListPanes() output). Default is the first two
non-meta columns of the queue.
statusCalculateOptional quoted expression that inspects the job's outputs and returns up-to-date status / heartbeat metadata. statusCalculate_LandR and statusCalculate_FireSenseFit are pre-built blocks for the most common SpaDES module outputs.
ss_idOptional Google Sheets / Drive folder ID. When provided,
workers mirror queue state to a sheet so a remote stakeholder can
watch progress in a browser. With ss_id = NULL (default) the
queue is purely local – no Google APIs are touched.
A typical usage pattern:
df <- expand.grid(.scenario = c("A", "B"), .rep = 1:2,
stringsAsFactors = FALSE)
ef <- experimentFuture(df = df, global_path = "global.R",
n_workers = 2L, log_dir = "logs")
Swap experimentFuture() for experimentTmux() or experimentSBATCH()
(adjusting cores / n_workers / sbatch_opts) and the rest of the
driver script is unchanged.
Rscript -e ... per row?At its core, that is exactly what each worker does. A worker assigns the
row's columns into .GlobalEnv and calls source("global.R"), which is
equivalent to:
Rscript -e '.ELFind <- "6.3.1"; .rep <- 1; source("global.R")'
Rscript -e '.ELFind <- "6.3.1"; .rep <- 2; source("global.R")'
When the number of sets to run is small, this works. As you add scenarios,
machines, authentication, race conditions, etc. the bookkeeping grows past what's
comfortable to maintain by hand. The experimentXXX functions are just that
bookkeeping.
experimentXXX functions deal with several issues that arise when running "parallel"
scripts, including:
Two shells launched at the same second can
both pick the same row. The experimentXXX functions take an exclusive filelock lock
on the queue between read and write, so each row is claimed at most
once across all workers and machines.
If a worker dies mid-job, the row is
stuck "in progress" with no record. The experimentXXX functions mark the row RUNNING
when claimed and DONE / INTERRUPTED when finished, so the next launch
skips DONE rows and (optionally, via tmuxRefreshQueueStatus() or
experimentFutureList() (kill = TRUE)) demotes orphaned RUNNING
rows back to PENDING for re-claim.
Rscript &; Rscript &; Rscript & scales as
"one process per row", which thrashes the box once you exceed the
core count. The experimentXXX functions take n_workers and let each worker pull
rows in sequence, so you cap parallelism explicitly.
Spawning N rows on each of M machines means either replicating the parameter grid by hand (and risking duplicate work) or sharding it (and losing dynamic load-balancing). With the experimentXXX functions, every worker on every machine pulls from the same queue, so a slow machine just claims fewer rows.
Rscript -e writes nothing structured – you
scrape PIDs and tail logs. The experimentXXX functions maintain a queue with
status / claimed_by / started_at / process_id / machine_name
so queueRead() gives a full snapshot, and experimentFutureList()
can enumerate live workers (and kill them) cluster-wide.
When ss_id is supplied, the
queue is mirrored to a Google Sheet a collaborator can open in a
browser; without that, "how is the run going?" requires SSH
access to the runner machine.
queueUploadMissing() / outScenarios()
anti-join the queue against the Drive upload folder so you can see
which DONE rows still need to be packaged and uploaded.
runNameLabel and statusCalculate
give one place to derive directory names and inspect output
artifacts – both per-runner and in tmuxRefreshQueueStatus() for
post-hoc rescans – without each global.R re-implementing them.
If you only ever run two rows on one machine and never restart, the two-line shell version is fine. The experimentXXX functions exist for the cases past that.
When you launch on more than one machine – experimentTmux(cores =
c("mega", "birds")) or experimentFuture(cores = c("localhost",
"camas")) – .setup_remote_machine() runs once per unique
remote host before any worker starts. It tries to make the remote R
session look enough like the local one that global.R runs the
same way. What it propagates / sets up:
SpaDES.project itself is rsynced from the local
.libPaths()[1] to the remote (or, if loaded via
devtools::load_all(), the source tree is rsynced and
R CMD INSTALL-ed). Require is version- and
RemoteSha-checked and rsynced if it's older or comes from a
different source than locally. Then Require::Install()
installs every package in SpaDES.project's Imports /
Depends / LinkingTo, plus any Suggests
installed locally (so optional runtime dependencies like
googlesheets4, cli, etc. follow along but the dev
toolchain doesn't).
terra, sf, rgdal, rgeos, lwgeom
are forced to compile from source on the remote (so they link
against the remote's libgdal.so etc., which may be a
different soversion than localhost's).
A best-effort sudo -n apt-get install -y --no-install-recommends
of the dev headers needed for the source-compiled packages
(libgdal-dev, libssl-dev, libcurl4-openssl-dev,
libxml2-dev, fonts/graphics, ...). Runs non-interactively;
if passwordless sudo isn't configured the failure is logged and
setup continues, expecting the libraries to be there already.
The remote ~/.Rprofile gets refreshed with
.libPaths(c(<local_lib>, .libPaths())),
options(repos = c("https://predictiveecology.r-universe.dev",
<local repos>)), options(defaultPackages = ...) (so the
remote uses the same minimal default-attached set as a fresh
Rscript), and Sys.setenv(CURL_CA_BUNDLE, SSL_CERT_FILE)
pointing at the system CA bundle (so libcurl can do HTTPS
even when /etc/profile.d/ isn't sourced under non-login
SSH). The remote $BASH_ENV, if set, is wrapped in a
subshell guard so a misbehaving sleep $UNSET can't kill
the SSH command shell before R starts.
The local GITHUB_PAT (read from gitcreds::gitcreds_get()
or a caller-supplied local_pat_file) is written to the
remote ~/.Renviron (chmod 0600) and to a per-lib file
<local_lib>/.spades_github_pat that's read at the top of
~/.Rprofile. git credential approve is also called
so command-line git on the remote authenticates the same
way. Required for pak to install private modules / dev
packages from GitHub.
The experimentXXX functions pass email + cache_path into each
worker; the worker calls
googlesheets4::gs4_auth(email = email, cache = cache_path)
non-interactively against the same cached OAuth token directory
the local session uses. The token directory itself isn't pushed
(it's expected to already exist via NFS or a prior login on the
remote); only the gargle_oauth_email /
gargle_oauth_cache options are forwarded so the same
identity is selected. If the cache isn't there, the worker prints
a gs4_auth warning and continues without GS access.
R/ folder + modulesThe directory next to global.R called R/ (where
project-specific helper functions live) is rsync -a --delete-ed
to the remote, so anything global.R source()s from
R/ works there too. With copyModules = TRUE the
SpaDES module path (getOption("spades.modulePath")) is
also rsynced, so module code stays in step.
global.R and the queue .rds are scp'd into the
same path on the remote (or, if the path is already on NFS such
as /mnt/shared_cache/..., they're effectively no-ops –
same absolute path on both ends).
Net effect: global.R on camas sees the same packages at
the same versions, the same GITHUB_PAT, the same R/
helpers, the same SSL trust store, and the same Google identity as
global.R on mega. Hand-rolling all of that for each
remote machine before each run is the bulk of what makes
"Rscript -e ... on N hosts" miserable in practice; the experimentXXX functions
do it once per unique host per call.
Once experimentFuture(cores = c("localhost", "camas", "dougfir"))
is launched, the workers on camas and dougfir are no
longer reachable via local ps / tools::pskill() – they
are R processes on other machines. experimentFutureList()
is the cluster-wide handle for them. Pass it the ef object
and it will:
Read the queue file (which is the authoritative record:
every claim writes machine_name + process_id under
a filelock, so workers on every machine appear there even
when they didn't redirect their stdout to a discoverable
worker_NN.log).
Probe each entry in ef$cores once with
ssh <core> hostname -s to build a map from OS hostname
(which is what Sys.info()[["nodename"]] writes to the
queue) to the SSH alias the master used to reach it
(e.g. A159604 -> dougfir). This is needed because
ssh A159604 typically fails – only ssh dougfir
resolves via ~/.ssh/config / /etc/hosts.
For every status == "RUNNING" row, verify the worker
is actually alive: file.exists("/proc/<pid>") for the
local machine, batched ssh <alias> "[ -d /proc/<pid> ]"
for each remote machine (one SSH connection per machine).
Return a data.frame with pid, machine,
started_at, queue_path, runName for every
live worker – local and remote, in one table.
kill = TRUE uses the same map to send the chosen signal
(TERM default, INT or KILL on request):
tools::pskill() for local PIDs and a single batched
ssh <alias> "kill -<sig> p1 p2 ..." per remote machine.
After signalling, it polls (locally via /proc, remotely via
SSH) for up to 10 s until the workers actually exit, then runs
tmuxRefreshQueueStatus() on each unique queue file to demote the
now-orphaned RUNNING rows back to PENDING. When
ss_id was supplied to the original experimentFuture()
call, an <queue_path>.ss_id sidecar is left behind;
kill = TRUE reads it and pushes the same demotion to the
Google Sheet via .gs_demote_after_kill(), so the GS view
converges with the local queue without a separate cleanup step.
Three usage shapes:
experimentFutureList(ef) # list everything live experimentFutureList(ef, kill = TRUE) # graceful TERM + queue refresh experimentFutureList(ef, kill = TRUE, signal = "KILL") # immediate
Across R sessions, when ef is gone, drive discovery off the
queue path directly:
experimentFutureList(queue_paths = "/mnt/shared_cache/.../future_queue.rds")
Without ef, the hostname-to-alias probe is skipped, so the
SSH check uses machine_name verbatim – which only works if
the OS hostname is itself reachable via SSH on the calling node
(i.e. it appears in ~/.ssh/config or /etc/hosts as a
Host entry). If not, you'll need to either keep ef in scope
or add the OS hostnames to your SSH config.
Concretely, the things you can do post-launch from the calling
machine without ever opening a terminal on camas /
dougfir:
See which row each remote worker is currently on.
Confirm that a remote worker actually died after a crash /
network blip (otherwise the queue would stay stuck at
RUNNING and no one would re-claim).
Send SIGTERM cluster-wide to abort an experiment
mid-run, then immediately re-launch a fixed global.R
against the same queue (any DONE rows are skipped, demoted
RUNNING rows are re-claimed).
Mirror that demotion to the Google Sheet so a stakeholder watching in a browser sees the change without needing to be told.
experimentMonitor() is the read-only entry point. Discovery
depends on what you pass:
experimentMonitor() (no args) – enumerates every
tmux pane on the calling machine across all tmux servers, same
as the historical tmuxListPanes().
experimentMonitor(ef) – queue-driven discovery
across all machines in ef$cores (with the
hostname-to-SSH-alias probe described above).
experimentMonitor(queue_paths = "...") – same as
ef mode, but for cross-session use when the ef
handle is gone.
stats = TRUE batches ps -o pid=,%cpu=,rss=,state=
(locally and via one SSH connection per remote node) to append:
state – R (running on CPU), S
(sleeping / waiting), D (uninterruptible sleep, often disk
I/O – persistent D = hang), T (stopped),
Z (zombie), Closed (R session exited but tmux pane
still open).
cpuAvg – percent CPU averaged over the process's
lifetime (note: not the instantaneous rate htop
shows).
RAM (GB) – resident memory (RSS), 1 decimal place.
availableCores – total CPUs on the node, from
nproc.
total RAM (GB) – total RAM on the node, from
/proc/meminfo.
availableCores and total RAM (GB) are constant across
all rows on the same node, so each pane's resource use is visible
relative to its node capacity. Unreachable nodes get NA for
all their rows; titles missing a parseable <node>-<pid> get
NA too – one bad pane / unreachable host doesn't poison the
rest of the table.
Single function, three sources, same stats columns either
way – so a stakeholder running experimentMonitor(ef, stats =
TRUE) on a laptop sees the same per-worker CPU / RAM picture that
experimentMonitor(stats = TRUE) (legacy tmux mode) gives on
the master node. tmuxListPanes() is preserved as a thin alias
that calls experimentMonitor() with no ef, so older
code keeps working unchanged.
Related families:
scenario_family – canonical record for one row of df,
reversibly convertible between field values, an output directory
path, and an upload tarball filename.
queueRead() / queueUploadMissing() / outList() /
outScenarios() – helpers for queues persisted to a Google
Sheet plus a Drive upload folder, including the
queue-vs-uploads anti-join.
experimentMonitor() – read-only worker / pane lister.
With no args, scans tmux panes; with ef or
queue_paths, scans the queue file's RUNNING rows and
verifies each PID is alive (local /proc, batched SSH for
remotes). stats = TRUE adds per-worker CPU / RSS /
state and per-node nproc / total RAM via batched
ps. tmuxListPanes() is a thin alias for the no-args
form.
tmuxRefreshQueueStatus() / tmuxFindDuplicates() /
tmuxKillPanes() – operational tools that work regardless of
which runner produced the queue.
experimentFutureList() – experimentFuture-side
equivalent of tmuxListPanes(): discovers live workers
across the cluster (driven off the queue file's RUNNING rows
plus an ssh <core> hostname -s alias probe), and with
kill = TRUE sends TERM / INT / KILL
to all of them in one call (local via tools::pskill(),
remote batched per machine via SSH), then refreshes the queue
and demotes the matching Google-Sheet rows when an
<queue_path>.ss_id sidecar is present.
All of these honour spades()'s events argument, which restricts the
events executed for each module (see SpaDES.core::spades()):
experiment() / experiment2(): pass events as a named argument;
it is forwarded to every spades() call, e.g.
experiment2(sim1, sim2, events = list(fireSpread = "init")). The
same events apply to all simulations / replicates.
experimentTmux() / experimentFuture() / experimentSBATCH():
there is no events argument because these functions do not call
spades() – your global.R does. To control events, add an events
column to df (each cell is the events spec for that row) and, inside
global.R, call spades(sim, events = events). Because each row carries
its own value, this gives per-scenario control of which events run
for any particular module – something the single shared events of the
in-memory family cannot do.
SpaDES.core::spades()
Given one or more simList objects, run a series of spades calls
in a structured, organized way. Methods are available to deal with outputs,
such as as.data.table.simLists() which can pull out simple to complex
values from every resulting simList or object saved by outputs
in every simList run. This uses future internally, allowing
for various backends and parallelism.
experiment2( ..., replicates = 1, clearSimEnv = FALSE, createUniquePaths = c("outputPath"), useCache = FALSE, debug = getOption("spades.debug"), drive_auth_account = NULL, meanStaggerIntervalInSecs = 1 )experiment2( ..., replicates = 1, clearSimEnv = FALSE, createUniquePaths = c("outputPath"), useCache = FALSE, debug = getOption("spades.debug"), drive_auth_account = NULL, meanStaggerIntervalInSecs = 1 )
... |
One or more |
replicates |
The number of replicates to run of the same |
clearSimEnv |
Logical. If TRUE, then the envir(sim) of each simList in the return list is emptied. This is to reduce RAM load of large return object. Default FALSE. |
createUniquePaths |
A character vector of the |
useCache |
Logical. Passed to |
debug |
Passed to |
drive_auth_account |
Optional character string. If provided, it will be passed
to each worker and run as |
meanStaggerIntervalInSecs |
If used, this will use
|
This function was moved here from the now-unmaintained SpaDES.experiment
package. See also the file-queue based experiment_family (e.g.
experimentFuture()) for a different, script-oriented approach.
This function, because of its class formalism, allows for methods to be used. For example,
as.data.table.simLists() allows user to pull out specific objects (in
the simList objects or on disk saved in outputPath(sim)).
The outputPath is changed so that every simulation puts outputs in a
sub-directory of the original outputPath of each simList (unless
createUniquePaths is character(0)/NULL).
Invisibly returns a simLists object. This class
extends the environment class and contains simList objects.
Any named argument in ... that is not consumed by experiment2 is passed
straight to SpaDES.core::spades(). In particular, spades()'s events
argument is honoured, so
experiment2(sim1, sim2, events = list(...)) restricts the events that run
for every simulation. Note this applies the same events specification to
all simLists / replicates. For per-scenario control of events, use the
file-queue experiment_family with an events column in df (see
experiment_family).
A simLists object can be made manually, if, say, many manual spades calls
have already been run. See example, via new("simLists")
Eliot McIntire
as.data.table.simLists(), SpaDES.core::spades(), experiment(),
experiment_family
## Not run: if (require("ggplot2", quietly = TRUE) && require("NLMR", quietly = TRUE) && require("RColorBrewer", quietly = TRUE)) { library(SpaDES.core) library(SpaDES.project) tmpdir <- file.path(tempdir(), "examples") # Make 3 simLists -- set up scenarios endTime <- 2 # Example of changing parameter values # Make 3 simLists with some differences between them mySim <- lapply(c(10, 20, 30), function(nFires) { simInit( times = list(start = 0.0, end = endTime, timeunit = "year"), params = list( .globals = list(stackName = "landscape", burnStats = "nPixelsBurned"), # Turn off interactive plotting fireSpread = list(.plotInitialTime = NA, spreadprob = c(0.2), nFires = c(10)), caribouMovement = list(.plotInitialTime = NA), randomLandscapes = list(.plotInitialTime = NA, .useCache = "init") ), modules = list("randomLandscapes", "fireSpread", "caribouMovement"), paths = list(modulePath = system.file("sampleModules", package = "SpaDES.core"), outputPath = tmpdir), # Save final state of landscape and caribou outputs = data.frame( objectName = c(rep("landscape", endTime), "caribou", "caribou"), saveTimes = c(seq_len(endTime), unique(c(ceiling(endTime / 2), endTime))), stringsAsFactors = FALSE ) ) }) planTypes <- c("sequential") # try others! ?future::plan sims <- experiment2(sim1 = mySim[[1]], sim2 = mySim[[2]], sim3 = mySim[[3]], replicates = 3) # Try pulling out values from simulation experiments # 2 variables df1 <- as.data.table(sims, vals = c("nPixelsBurned", NCaribou = quote(length(caribou$x1)))) # Now use objects that were saved to disk at different times during spades call df1 <- as.data.table(sims, vals = c("nPixelsBurned", NCaribou = quote(length(caribou$x1))), objectsFromOutputs = list(nPixelsBurned = NA, NCaribou = "caribou")) # now calculate 4 different values, some from data saved at different times # Define new function -- this calculates perimeter to area ratio fn <- quote({ landscape$Fires[landscape$Fires[] == 0] <- NA; a <- boundaries(landscape$Fires, type = "inner"); a[landscape$Fires[] > 0 & a[] == 1] <- landscape$Fires[landscape$Fires[] > 0 & a[] == 1]; peri <- table(a[]); area <- table(landscape$Fires[]); keep <- match(names(area),names(peri)); mean(peri[keep]/area) }) df1 <- as.data.table(sims, vals = c("nPixelsBurned", perimToArea = fn, meanFireSize = quote(mean(table(landscape$Fires[])[-1])), caribouPerHaFire = quote({ NROW(caribou) / mean(table(landscape$Fires[])[-1]) })), objectsFromOutputs = list(NA, c("landscape"), c("landscape"), c("landscape", "caribou")), objectsFromSim = "nPixelsBurned") if (interactive()) { # with an unevaluated string library(ggplot2) p <- lapply(unique(df1$vals), function(var) { ggplot(df1[vals == var,], aes(x = saveTime, y = value, group = simList, color = simList)) + stat_summary(geom = "point", fun.y = mean) + stat_summary(geom = "line", fun.y = mean) + stat_summary(geom = "errorbar", fun.data = mean_se, width = 0.2) + ylab(var) }) # Arrange all 4 -- could use gridExtra::grid.arrange -- easier pushViewport(viewport(layout = grid.layout(2, 2))) vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y) print(p[[1]], vp = vplayout(1, 1)) print(p[[2]], vp = vplayout(1, 2)) print(p[[3]], vp = vplayout(2, 1)) print(p[[4]], vp = vplayout(2, 2)) } } ## End(Not run)## Not run: if (require("ggplot2", quietly = TRUE) && require("NLMR", quietly = TRUE) && require("RColorBrewer", quietly = TRUE)) { library(SpaDES.core) library(SpaDES.project) tmpdir <- file.path(tempdir(), "examples") # Make 3 simLists -- set up scenarios endTime <- 2 # Example of changing parameter values # Make 3 simLists with some differences between them mySim <- lapply(c(10, 20, 30), function(nFires) { simInit( times = list(start = 0.0, end = endTime, timeunit = "year"), params = list( .globals = list(stackName = "landscape", burnStats = "nPixelsBurned"), # Turn off interactive plotting fireSpread = list(.plotInitialTime = NA, spreadprob = c(0.2), nFires = c(10)), caribouMovement = list(.plotInitialTime = NA), randomLandscapes = list(.plotInitialTime = NA, .useCache = "init") ), modules = list("randomLandscapes", "fireSpread", "caribouMovement"), paths = list(modulePath = system.file("sampleModules", package = "SpaDES.core"), outputPath = tmpdir), # Save final state of landscape and caribou outputs = data.frame( objectName = c(rep("landscape", endTime), "caribou", "caribou"), saveTimes = c(seq_len(endTime), unique(c(ceiling(endTime / 2), endTime))), stringsAsFactors = FALSE ) ) }) planTypes <- c("sequential") # try others! ?future::plan sims <- experiment2(sim1 = mySim[[1]], sim2 = mySim[[2]], sim3 = mySim[[3]], replicates = 3) # Try pulling out values from simulation experiments # 2 variables df1 <- as.data.table(sims, vals = c("nPixelsBurned", NCaribou = quote(length(caribou$x1)))) # Now use objects that were saved to disk at different times during spades call df1 <- as.data.table(sims, vals = c("nPixelsBurned", NCaribou = quote(length(caribou$x1))), objectsFromOutputs = list(nPixelsBurned = NA, NCaribou = "caribou")) # now calculate 4 different values, some from data saved at different times # Define new function -- this calculates perimeter to area ratio fn <- quote({ landscape$Fires[landscape$Fires[] == 0] <- NA; a <- boundaries(landscape$Fires, type = "inner"); a[landscape$Fires[] > 0 & a[] == 1] <- landscape$Fires[landscape$Fires[] > 0 & a[] == 1]; peri <- table(a[]); area <- table(landscape$Fires[]); keep <- match(names(area),names(peri)); mean(peri[keep]/area) }) df1 <- as.data.table(sims, vals = c("nPixelsBurned", perimToArea = fn, meanFireSize = quote(mean(table(landscape$Fires[])[-1])), caribouPerHaFire = quote({ NROW(caribou) / mean(table(landscape$Fires[])[-1]) })), objectsFromOutputs = list(NA, c("landscape"), c("landscape"), c("landscape", "caribou")), objectsFromSim = "nPixelsBurned") if (interactive()) { # with an unevaluated string library(ggplot2) p <- lapply(unique(df1$vals), function(var) { ggplot(df1[vals == var,], aes(x = saveTime, y = value, group = simList, color = simList)) + stat_summary(geom = "point", fun.y = mean) + stat_summary(geom = "line", fun.y = mean) + stat_summary(geom = "errorbar", fun.data = mean_se, width = 0.2) + ylab(var) }) # Arrange all 4 -- could use gridExtra::grid.arrange -- easier pushViewport(viewport(layout = grid.layout(2, 2))) vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y) print(p[[1]], vp = vplayout(1, 1)) print(p[[2]], vp = vplayout(1, 2)) print(p[[3]], vp = vplayout(2, 1)) print(p[[4]], vp = vplayout(2, 2)) } } ## End(Not run)
A tmux-free alternative to experimentTmux that dispatches
parallel workers as background R processes via callr. Workers claim
jobs from a GoogleSheets or local-RDS queue, source global_path for
each job, and write all console output to per-worker log files on localhost.
The function returns immediately (non-blocking) with a handle object of
class "experimentFuture". Use awaitExperimentFuture to
block until all workers finish, or print(ef) to check live status.
Because workers write to files via callr::r_bg()'s stdout /
stderr arguments, logs appear in real time and can be followed with
tail -f.
experimentFuture( df, global_path = "global.R", cores = NULL, n_workers = if (is.null(cores)) 4L else length(cores), queue_path = NULL, on_interrupt = c("requeue", "fail"), ss_id = NULL, forceLocalQueueToGS = FALSE, email = getOption("gargle_oauth_email"), cache_path = getOption("gargle_oauth_cache"), runNameLabel = quote(colnames(q)[1:2]), log_dir = "logs", activeRunningPath = getOption("spades.activeRunningPath"), sp_dev_path = NULL, local_pat_file = NULL, copyModules = FALSE, ... )experimentFuture( df, global_path = "global.R", cores = NULL, n_workers = if (is.null(cores)) 4L else length(cores), queue_path = NULL, on_interrupt = c("requeue", "fail"), ss_id = NULL, forceLocalQueueToGS = FALSE, email = getOption("gargle_oauth_email"), cache_path = getOption("gargle_oauth_cache"), runNameLabel = quote(colnames(q)[1:2]), log_dir = "logs", activeRunningPath = getOption("spades.activeRunningPath"), sp_dev_path = NULL, local_pat_file = NULL, copyModules = FALSE, ... )
df |
data.frame of parameter combinations. Each row is one job. |
global_path |
Path to the R script each worker sources per job.
Defaults to |
cores |
|
n_workers |
Number of parallel workers. Defaults to
|
queue_path |
Path to the local RDS queue file. Created automatically
if it does not yet exist. Defaults to
|
on_interrupt |
|
ss_id |
Google Sheets ID (or Drive folder ID) for the shared queue. When provided workers use the GS backend instead of the local RDS file. |
forceLocalQueueToGS |
If |
email |
Gargle OAuth e-mail for Google Sheets auth. |
cache_path |
Gargle OAuth cache directory. |
runNameLabel |
Quoted expression evaluated in the job environment to derive a human-readable run name (used in log messages and queue metadata). |
log_dir |
Directory for per-worker log files. Created if needed.
Defaults to |
activeRunningPath |
Directory for |
sp_dev_path |
Local path to SpaDES.project source tree to sync to
remote workers (optional; uses installed binary if |
local_pat_file |
Path to a file containing a GitHub PAT to copy to remote workers. |
copyModules |
Logical. If |
... |
Additional named arguments stored in |
An object of class "experimentFuture" (a list) containing:
procsList of callr::r_bg process objects, one per
local worker (or future objects for remote cluster workers).
log_filesCharacter vector of log file paths.
log_dirAbsolute path to the log directory.
queue_pathAbsolute path to the queue RDS file.
coresThe cores argument as supplied.
experimentTmux, awaitExperimentFuture,
tmuxRunWorkerLoop
## Not run: ## -- Minimal: build a tiny global.R, then run a 2 x 2 experiment --------- tdir <- file.path(tempdir(), "experimentFuture-demo") dir.create(tdir, showWarnings = FALSE, recursive = TRUE) writeLines( 'message("scenario=", .scenario, " rep=", .rep); Sys.sleep(2)', file.path(tdir, "global.R") ) expt <- expand.grid(.scenario = c("A", "B"), .rep = 1:2, stringsAsFactors = FALSE) ef <- experimentFuture( df = expt, global_path = file.path(tdir, "global.R"), n_workers = 2L, queue_path = file.path(tdir, "future_queue.rds"), log_dir = file.path(tdir, "logs") ) ## -- Live inspection while workers run ----------------------------------- print(ef) # alive/done per worker experimentMonitor(ef) # pid + machine + runName experimentMonitor(ef, stats = TRUE) # adds CPU / RAM / state queueRead(ef$queue_path) # full queue snapshot experimentFutureList(ef) # cluster-wide pid list cat(readLines(ef$log_files[[1L]]), sep = "\n") # tail one log awaitExperimentFuture(ef) # blocks until both workers exit ## -- Killing workers ---------------------------------------------------- # Graceful stop: workers finish their CURRENT job, then exit. # Any remaining PENDING jobs stay in the queue and can be resumed later # by calling experimentFuture() again with the same queue_path. killExperimentFuture(ef) # Immediate stop (force): workers are killed immediately. # Jobs that were mid-execution may remain as RUNNING in the queue; reset them with: # tmuxRefreshQueueStatus(ef$queue_path) # file-based backend # The GS backend reclaims stale RUNNING entries automatically before each new claim. killExperimentFuture(ef, force = TRUE) tmuxRefreshQueueStatus(ef$queue_path) # clean up stale RUNNING entries # Cluster-wide kill (works for `cores = c(...)` clusters too): # sends SIGTERM to every worker on every machine, waits for exit, runs # tmuxRefreshQueueStatus(), and pushes the demotion to the Google Sheet # if `ss_id` was used (via the <queue_path>.ss_id sidecar). experimentFutureList(ef, kill = TRUE) ## -- Resuming after a kill ---------------------------------------------- # Jobs left as PENDING (or INTERRUPTED with on_interrupt = "requeue") are # automatically picked up when you call experimentFuture() again with the # same queue_path -- no need to re-specify df. ef2 <- experimentFuture( df = expt, # ignored if queue_path already exists global_path = file.path(tdir, "global.R"), n_workers = 2L, queue_path = file.path(tdir, "future_queue.rds"), log_dir = file.path(tdir, "logs") ) awaitExperimentFuture(ef2) # wait for remaining jobs to finish queueRead(ef2$queue_path) # full snapshot (data.table) table(queueRead(ef2$queue_path)$status) # all DONE cat(readLines(ef2$log_files[[1]]), sep = "\n") # inspect worker 1 log ## -- Remote workers (pre-setup required) ------------------------------- ef <- experimentFuture( df = expt, global_path = file.path(tdir, "global.R"), cores = c("node01", "node02"), n_workers = 2L, ss_id = "YOUR_GOOGLE_SHEET_ID", email = "[email protected]", cache_path = "~/.cache/gargle", local_pat_file = "~/.github_pat" ) killExperimentFuture(ef) # graceful stop on remote workers too ## End(Not run)## Not run: ## -- Minimal: build a tiny global.R, then run a 2 x 2 experiment --------- tdir <- file.path(tempdir(), "experimentFuture-demo") dir.create(tdir, showWarnings = FALSE, recursive = TRUE) writeLines( 'message("scenario=", .scenario, " rep=", .rep); Sys.sleep(2)', file.path(tdir, "global.R") ) expt <- expand.grid(.scenario = c("A", "B"), .rep = 1:2, stringsAsFactors = FALSE) ef <- experimentFuture( df = expt, global_path = file.path(tdir, "global.R"), n_workers = 2L, queue_path = file.path(tdir, "future_queue.rds"), log_dir = file.path(tdir, "logs") ) ## -- Live inspection while workers run ----------------------------------- print(ef) # alive/done per worker experimentMonitor(ef) # pid + machine + runName experimentMonitor(ef, stats = TRUE) # adds CPU / RAM / state queueRead(ef$queue_path) # full queue snapshot experimentFutureList(ef) # cluster-wide pid list cat(readLines(ef$log_files[[1L]]), sep = "\n") # tail one log awaitExperimentFuture(ef) # blocks until both workers exit ## -- Killing workers ---------------------------------------------------- # Graceful stop: workers finish their CURRENT job, then exit. # Any remaining PENDING jobs stay in the queue and can be resumed later # by calling experimentFuture() again with the same queue_path. killExperimentFuture(ef) # Immediate stop (force): workers are killed immediately. # Jobs that were mid-execution may remain as RUNNING in the queue; reset them with: # tmuxRefreshQueueStatus(ef$queue_path) # file-based backend # The GS backend reclaims stale RUNNING entries automatically before each new claim. killExperimentFuture(ef, force = TRUE) tmuxRefreshQueueStatus(ef$queue_path) # clean up stale RUNNING entries # Cluster-wide kill (works for `cores = c(...)` clusters too): # sends SIGTERM to every worker on every machine, waits for exit, runs # tmuxRefreshQueueStatus(), and pushes the demotion to the Google Sheet # if `ss_id` was used (via the <queue_path>.ss_id sidecar). experimentFutureList(ef, kill = TRUE) ## -- Resuming after a kill ---------------------------------------------- # Jobs left as PENDING (or INTERRUPTED with on_interrupt = "requeue") are # automatically picked up when you call experimentFuture() again with the # same queue_path -- no need to re-specify df. ef2 <- experimentFuture( df = expt, # ignored if queue_path already exists global_path = file.path(tdir, "global.R"), n_workers = 2L, queue_path = file.path(tdir, "future_queue.rds"), log_dir = file.path(tdir, "logs") ) awaitExperimentFuture(ef2) # wait for remaining jobs to finish queueRead(ef2$queue_path) # full snapshot (data.table) table(queueRead(ef2$queue_path)$status) # all DONE cat(readLines(ef2$log_files[[1]]), sep = "\n") # inspect worker 1 log ## -- Remote workers (pre-setup required) ------------------------------- ef <- experimentFuture( df = expt, global_path = file.path(tdir, "global.R"), cores = c("node01", "node02"), n_workers = 2L, ss_id = "YOUR_GOOGLE_SHEET_ID", email = "[email protected]", cache_path = "~/.cache/gargle", local_pat_file = "~/.github_pat" ) killExperimentFuture(ef) # graceful stop on remote workers too ## End(Not run)
Cross-session worker discovery for experimentFuture. Scans
/proc for R processes whose redirected stdout points to a
worker_<NN>.log file (the convention written by
callr::r_bg(stdout = log_files[[i]]) in
experimentFuture), regardless of which R session originally
spawned them. This is the right tool when:
you re-ran the experimentFuture example in a new R session and
a fresh tail -f is silent because the previous run's workers
are still claiming queue rows;
you want to clean up orphans without remembering each
ef handle;
you want a one-glance view of which row each worker is
currently running (joined against the queue's
status == "RUNNING" process_id).
Linux-only (uses /proc/<pid>/fd/1 to find the log file each
worker is writing). For other Unixes use lsof -p <pid> or
ps -ef | grep tmuxRunWorkerLoop as a manual substitute.
experimentFutureList( ef = NULL, kill = FALSE, signal = c("TERM", "INT", "KILL"), queue_paths = NULL )experimentFutureList( ef = NULL, kill = FALSE, signal = c("TERM", "INT", "KILL"), queue_paths = NULL )
ef |
Optional shorthand: an |
kill |
If |
signal |
One of |
queue_paths |
Optional character vector of queue |
A data.frame (one row per live worker) with columns:
pidWorker process ID.
started_atApproximate process start time
(ctime of /proc/<pid>).
log_filePath the worker is writing stdout/stderr to.
queue_pathThe first *_queue.rds found in the
log directory's parent (where experimentFuture puts it by
default), or NA if not located.
runNameHyphen-joined data column values of the row
this worker is currently running, derived from the queue's
status == "RUNNING" entry whose process_id matches.
NA if the worker is between jobs.
When kill = TRUE, the same data.frame is returned (invisibly)
describing the workers that were signalled.
experimentFuture, killExperimentFuture,
tmuxRefreshQueueStatus
## Not run: # Just list everything that's running (auto-discovery via /proc only) experimentFutureList() # Pass the ef handle to also pick up PSOCK cluster workers and remote # workers (anything in the queue, on any machine in `cores`). ef <- experimentFuture(df = df, global_path = "global.R", cores = c("localhost", "camas"), ...) experimentFutureList(ef) experimentFutureList(ef, kill = TRUE) # Across R sessions, when ef is gone, drive discovery off the queue path: experimentFutureList(queue_paths = "/mnt/shared_cache/.../future_queue.rds") # Hard kill (SIGKILL, no chance to update queue meta on the worker side -- # but the post-kill tmuxRefreshQueueStatus() still demotes the rows). experimentFutureList(ef, kill = TRUE, signal = "KILL") ## End(Not run)## Not run: # Just list everything that's running (auto-discovery via /proc only) experimentFutureList() # Pass the ef handle to also pick up PSOCK cluster workers and remote # workers (anything in the queue, on any machine in `cores`). ef <- experimentFuture(df = df, global_path = "global.R", cores = c("localhost", "camas"), ...) experimentFutureList(ef) experimentFutureList(ef, kill = TRUE) # Across R sessions, when ef is gone, drive discovery off the queue path: experimentFutureList(queue_paths = "/mnt/shared_cache/.../future_queue.rds") # Hard kill (SIGKILL, no chance to update queue meta on the worker side -- # but the post-kill tmuxRefreshQueueStatus() still demotes the rows). experimentFutureList(ef, kill = TRUE, signal = "KILL") ## End(Not run)
Single read-only entry point for inspecting workers regardless of which runner spawned them. Discovery is driven by what you pass:
experimentMonitor(ef = NULL, queue_paths = NULL, stats = FALSE)experimentMonitor(ef = NULL, queue_paths = NULL, stats = FALSE)
ef |
Optional |
queue_paths |
Optional character vector of queue |
stats |
Logical. When |
Default (ef = NULL, queue_paths = NULL) – enumerates tmux
panes via tmux -S <socket> list-panes -a across every tmux server
under $TMUX_TMPDIR/tmux-<uid>/. Same behaviour the historical
tmuxListPanes() had. Per-socket failures are swallowed so one
broken socket cannot poison the rest; works outside a tmux pane
and across multiple tmux servers (e.g. sessions started under
different -L names). Cluster_Monitor panes are filtered out.
ef supplied (or queue_paths) – reads each queue file's
status == "RUNNING" rows, probes ssh <core> hostname -s once
per non-local entry in ef$cores to map OS hostnames (which is
what Sys.info()[["nodename"]] writes to the queue) back to SSH
aliases (~/.ssh/config / /etc/hosts entries), and verifies
each PID is alive (/proc/<pid> locally, batched
ssh <alias> "[ -d /proc/<pid> ]" remotely). This is the
experimentFuture() / experimentSBATCH() equivalent of the
tmux pane scan – workers there don't necessarily live in a
tmux pane, so the queue file is the authoritative record.
Either way, stats = TRUE runs the same ps -o pid=,%cpu=,rss=,state=
batch (locally and via one SSH connection per remote node) to append
CPU / RSS / state plus per-node nproc / total RAM.
Data.frame whose columns depend on the discovery mode:
tmux mode – session, window, pane, pane_id,
pane_ref (the "session:window.pane" string), title,
node (first dash-separated token in title that matches a
cluster alias from /etc/hosts; falls back to
localHostLabel() when the title contains only the raw local
hostname; NA if no match).
queue mode – pid, machine, started_at, log_file
(NA when the worker isn't a callr::r_bg writer), queue_path,
runName.
With stats = TRUE, five additional columns appear in either
mode: state, cpuAvg, RAM (GB), availableCores,
total RAM (GB). Returns an empty data.frame (0 rows, same
columns) if no workers are found.
The state column is the best single signal for hang-detection because
it is a snapshot (no time window needed). Values:
| State | Meaning |
R |
running on CPU right now |
S |
sleeping (waiting on I/O, timer, or lock) |
D |
uninterruptible sleep (usually disk I/O; persistent D can indicate a hang) |
T |
stopped (SIGSTOP or similar) |
Z |
zombie (dead but not yet reaped) |
Closed |
worker process has exited -- PID no longer exists |
NA |
could not determine (machine unreachable, or no parseable <node>-<pid> in title)
|
experimentFutureList() for the same queue-mode discovery
plus cluster-wide kill / queue refresh / GS demotion.
tmuxListPanes() is preserved as a thin alias that calls this
function with no ef.
A Slurm-native sibling of experimentTmux and
experimentFuture. Submits n_workers long-lived SBATCH
jobs that each call tmuxRunWorkerLoop against the shared
queue, claiming and running rows until the queue is empty (or the
worker's stop file appears). Same queue / global.R / runNameLabel
/ statusCalculate semantics as the other two runners.
Returns a non-blocking handle of class "experimentSBATCH" carrying
the Slurm job IDs. Use awaitExperimentSBATCH to poll
squeue until all jobs leave the queue, or
killExperimentSBATCH to stop them (gracefully via stop files,
or immediately via scancel).
experimentSBATCH( df, global_path = "global.R", n_workers = 4L, queue_path = NULL, on_interrupt = c("requeue", "fail"), ss_id = NULL, forceLocalQueueToGS = FALSE, email = getOption("gargle_oauth_email"), cache_path = getOption("gargle_oauth_cache"), runNameLabel = quote(colnames(q)[1:2]), log_dir = "logs", activeRunningPath = getOption("spades.activeRunningPath"), sbatch_opts = list(), sbatch_cmd = "sbatch", r_cmd = file.path(R.home("bin"), "Rscript"), r_libs = .libPaths(), dry_run = FALSE, ... )experimentSBATCH( df, global_path = "global.R", n_workers = 4L, queue_path = NULL, on_interrupt = c("requeue", "fail"), ss_id = NULL, forceLocalQueueToGS = FALSE, email = getOption("gargle_oauth_email"), cache_path = getOption("gargle_oauth_cache"), runNameLabel = quote(colnames(q)[1:2]), log_dir = "logs", activeRunningPath = getOption("spades.activeRunningPath"), sbatch_opts = list(), sbatch_cmd = "sbatch", r_cmd = file.path(R.home("bin"), "Rscript"), r_libs = .libPaths(), dry_run = FALSE, ... )
df |
data.frame of parameter combinations. Each row is one job.
Ignored if |
global_path |
Path to the R script each worker sources per job.
Defaults to |
n_workers |
Number of SBATCH jobs to submit. Defaults to |
queue_path |
Path to the local RDS queue file. Created automatically
if it does not yet exist. Defaults to
|
on_interrupt |
|
ss_id |
Google Sheets ID (or Drive folder ID) for the shared queue.
When provided workers use the GS backend in addition to the local RDS
file (mirroring |
forceLocalQueueToGS |
If |
email |
Gargle OAuth e-mail for Google Sheets auth (only used when
|
cache_path |
Gargle OAuth cache directory. |
runNameLabel |
Quoted expression evaluated in the job environment to
derive a human-readable run name (used in log messages and queue
metadata). Defaults to |
log_dir |
Directory for per-worker log files, generated job scripts,
and stop files. Created if needed. Defaults to |
activeRunningPath |
Directory for |
sbatch_opts |
Named list of SBATCH directives. Each |
sbatch_cmd |
Path to the |
r_cmd |
Path to the R interpreter to invoke on compute nodes.
Defaults to |
r_libs |
Character vector of library paths to set via |
dry_run |
If |
... |
Additional named arguments stored in |
An object of class "experimentSBATCH" (a list) containing:
job_idsInteger vector of Slurm job IDs (or NA
under dry_run = TRUE).
job_scriptsCharacter vector of generated SBATCH script paths.
log_filesCharacter vector of log file paths.
stop_filesCharacter vector of stop-file paths.
log_dirAbsolute path to the log directory.
queue_pathAbsolute path to the queue RDS file.
experimentTmux, experimentFuture,
awaitExperimentSBATCH, killExperimentSBATCH
## Not run: ## -- Minimal: build a tiny global.R, then run a 2 x 2 experiment --------- # Use a directory on your shared HPC filesystem (NFS / Lustre / BeeGFS). tdir <- file.path(tempdir(), "experimentSBATCH-demo") dir.create(tdir, showWarnings = FALSE, recursive = TRUE) writeLines( 'message("scenario=", .scenario, " rep=", .rep); Sys.sleep(2)', file.path(tdir, "global.R") ) expt <- expand.grid(.scenario = c("A", "B"), .rep = 1:2, stringsAsFactors = FALSE) es <- experimentSBATCH( df = expt, global_path = file.path(tdir, "global.R"), n_workers = 2L, queue_path = file.path(tdir, "sbatch_queue.rds"), log_dir = file.path(tdir, "logs"), sbatch_opts = list(partition = "compute", time = "00:30:00", mem = "1G") ) ## -- Live inspection while jobs run -------------------------------------- print(es) # job IDs + squeue status queueRead(es$queue_path) # full queue snapshot experimentMonitor(queue_paths = es$queue_path) # cluster-wide pid + machine experimentMonitor(queue_paths = es$queue_path, stats = TRUE) # + CPU/RAM awaitExperimentSBATCH(es) # block until squeue empty ## -- Larger experiment with full sbatch_opts ------------------------------ es <- experimentSBATCH( df = expt, global_path = file.path(tdir, "global.R"), n_workers = 4L, queue_path = file.path(tdir, "sbatch_queue.rds"), log_dir = file.path(tdir, "logs"), sbatch_opts = list( partition = "compute", time = "24:00:00", mem = "16G", cpus_per_task = 4, account = "my_alloc" ) ) print(es) # job IDs + squeue status per worker awaitExperimentSBATCH(es) # blocks until every job ID leaves squeue # Graceful stop (workers exit between jobs, queue rows stay PENDING): killExperimentSBATCH(es) # Immediate stop (scancel; stale RUNNING entries can be cleaned up via: # tmuxRefreshQueueStatus(es$queue_path)): killExperimentSBATCH(es, force = TRUE) tmuxRefreshQueueStatus(es$queue_path) ## -- Resume after stop --------------------------------------------------- # Same `queue_path` -> DONE rows are skipped, demoted PENDING rows are # re-claimed by the new sbatch jobs. es2 <- experimentSBATCH( df = expt, # ignored if queue exists global_path = file.path(tdir, "global.R"), n_workers = 2L, queue_path = file.path(tdir, "sbatch_queue.rds"), log_dir = file.path(tdir, "logs"), sbatch_opts = list(partition = "compute", time = "00:30:00", mem = "1G") ) awaitExperimentSBATCH(es2) table(queueRead(es2$queue_path)$status) # all DONE ## End(Not run)## Not run: ## -- Minimal: build a tiny global.R, then run a 2 x 2 experiment --------- # Use a directory on your shared HPC filesystem (NFS / Lustre / BeeGFS). tdir <- file.path(tempdir(), "experimentSBATCH-demo") dir.create(tdir, showWarnings = FALSE, recursive = TRUE) writeLines( 'message("scenario=", .scenario, " rep=", .rep); Sys.sleep(2)', file.path(tdir, "global.R") ) expt <- expand.grid(.scenario = c("A", "B"), .rep = 1:2, stringsAsFactors = FALSE) es <- experimentSBATCH( df = expt, global_path = file.path(tdir, "global.R"), n_workers = 2L, queue_path = file.path(tdir, "sbatch_queue.rds"), log_dir = file.path(tdir, "logs"), sbatch_opts = list(partition = "compute", time = "00:30:00", mem = "1G") ) ## -- Live inspection while jobs run -------------------------------------- print(es) # job IDs + squeue status queueRead(es$queue_path) # full queue snapshot experimentMonitor(queue_paths = es$queue_path) # cluster-wide pid + machine experimentMonitor(queue_paths = es$queue_path, stats = TRUE) # + CPU/RAM awaitExperimentSBATCH(es) # block until squeue empty ## -- Larger experiment with full sbatch_opts ------------------------------ es <- experimentSBATCH( df = expt, global_path = file.path(tdir, "global.R"), n_workers = 4L, queue_path = file.path(tdir, "sbatch_queue.rds"), log_dir = file.path(tdir, "logs"), sbatch_opts = list( partition = "compute", time = "24:00:00", mem = "16G", cpus_per_task = 4, account = "my_alloc" ) ) print(es) # job IDs + squeue status per worker awaitExperimentSBATCH(es) # blocks until every job ID leaves squeue # Graceful stop (workers exit between jobs, queue rows stay PENDING): killExperimentSBATCH(es) # Immediate stop (scancel; stale RUNNING entries can be cleaned up via: # tmuxRefreshQueueStatus(es$queue_path)): killExperimentSBATCH(es, force = TRUE) tmuxRefreshQueueStatus(es$queue_path) ## -- Resume after stop --------------------------------------------------- # Same `queue_path` -> DONE rows are skipped, demoted PENDING rows are # re-claimed by the new sbatch jobs. es2 <- experimentSBATCH( df = expt, # ignored if queue exists global_path = file.path(tdir, "global.R"), n_workers = 2L, queue_path = file.path(tdir, "sbatch_queue.rds"), log_dir = file.path(tdir, "logs"), sbatch_opts = list(partition = "compute", time = "00:30:00", mem = "1G") ) awaitExperimentSBATCH(es2) table(queueRead(es2$queue_path)$status) # all DONE ## End(Not run)
Creates n_workers tmux panes in the current window, tiles them, and starts
a worker loop in each one that claims and runs jobs from a file-backed queue
(queue_path). Control returns immediately to the master pane; all work
happens asynchronously inside the worker panes.
pane_mode)"killAndNewPane" (default)Each worker runs one job per R session, then exits. A fresh R session starts automatically for the next job, freeing all memory between runs.
localhost panes: After each job, tmuxRunWorkerLoop() calls
tmux respawn-pane -k, which replaces the current pane's process
in-place with a new Rscript invocation. No retiling needed.
Remote panes (cores = "hostname"): The local pane runs a bash
while-loop that repeatedly calls
ssh -t host bash -c 'exec env R_PROFILE_USER=<script> R --interactive'.
ssh -t allocates a PTY so R runs interactively (readline, OSC 2 title
updates, Ctrl+C propagation). A startup script injected via
R_PROFILE_USER runs one job then exits; q(status = 1L) (job done or
queue empty) lets the while-loop start a fresh R session, q() (status 0)
stops the loop. R_PROFILE_USER is unset inside R immediately
after startup so workers spawned by makeClusterPSOCK() do not inherit
it and inadvertently re-run the startup script.
"reuse"Each worker loops inside a single R session (repeat { tmuxRunNextWorker() }).
Memory accumulates across jobs – useful for lightweight simulations.
cores)Supplying a hostname in cores triggers .setup_remote_machine() once per
unique host before any workers start. Steps run in this order:
Guard BASH_ENV – wraps the remote $BASH_ENV file's existing
content in a subshell (( ... ) 2>/dev/null || true) so that any exit
or failing command inside it cannot abort the non-interactive SSH shell
that carries setup commands.
Create remote directory; copy files – mkdir -p the remote working
directory (same relative path from ~ as on localhost), then scp
global_path, queue_path, and dots_path (if supplied) into it.
Rsync project R/ folder – syncs the R/ subdirectory next to
global_path to the remote with rsync --delete so user-defined
helper functions sourced by global.R are up to date.
Write ~/.Rprofile on remote – injects three lines (replacing any
previous versions): .libPaths(c(local_lib, ...)) so the project library
takes precedence over system libraries; options(repos = ...) including
the PredictiveEcology r-universe; and an SSL block that sets
CURL_CA_BUNDLE/SSL_CERT_FILE so HTTPS downloads work in non-login
SSH sessions where /etc/profile.d/ is not sourced.
Verify/install Require – compares the remote Require version and
git commit SHA to the local installation. If they differ, rsyncs the
installed directory (GitHub source) or runs install.packages("Require")
(CRAN source).
Install usethis on the remote via Require::Install().
Propagate GitHub credentials – reads the local token via
gitcreds::gitcreds_get() and pipes it into git credential approve on
the remote so private GitHub packages can be installed without interactive
setup. Falls back to checking whether the remote already has credentials;
errors if neither is true.
Install system libraries via
sudo -n apt-get install -y --no-install-recommends (non-interactive;
fails gracefully if passwordless sudo is not configured). Libraries
installed: spatial (libgdal-dev, libgeos-dev, libproj-dev,
libsqlite3-dev, libudunits2-dev), HTTP/TLS (libssl-dev,
libcurl4-openssl-dev), XML (libxml2-dev), archive (libarchive-dev),
git (libgit2-dev), fonts/graphics (libfontconfig1-dev,
libharfbuzz-dev, libfribidi-dev, libpng-dev, libjpeg-dev,
libtiff-dev, libfreetype6-dev), protobuf (libabsl-dev), and R
compilation headers (r-base-dev).
Ensure remote lib path exists – mkdir -p the project library path
on the remote (must match localhost exactly so installed file paths are
identical).
Rsync SpaDES.project – copies the locally installed SpaDES.project
directory to the same path on the remote. Both machines must share the
same platform and R version so compiled lazy-load databases are compatible.
Install SpaDES.project dependencies via Require::Install().
Spatial packages (terra, sf, rgdal, rgeos, lwgeom) are compiled
from source so they link against the remote's actual GDAL/GEOS/PROJ
versions. All other hard dependencies (Imports/Depends/LinkingTo) plus
any Suggests packages installed locally are installed as binaries via
Require::setLinuxBinaryRepo(). Common packages with strict version
requirements (purrr >= 1.2.1, rlang >= 1.1.7, cli >= 3.6.0,
vctrs >= 0.6.0) are pre-installed to the project library to avoid
stale system-library versions being picked up during compilation.
Rsync Require package cache (Require::cachePkgDir()) to the
remote to accelerate future package installations.
Rsync gargle OAuth cache (cache_path or
getOption("gargle_oauth_cache")) to the remote so the worker can
authenticate with Google APIs (Sheets, Drive) without a browser prompt.
Pane 1 starts immediately. Pane i > 1 waits
delay_before_source + (i - 2) * stagger_by seconds inside R before
claiming its first job, avoiding simultaneous queue contention at startup.
For remote workers in killAndNewPane mode the stagger only applies to the
first R session; subsequent while-loop iterations start immediately.
If a worker pane is manually interrupted (e.g. Ctrl+C) and drops to a shell
prompt, restart it by pressing (up-arrow) (up-arrow) in that pane and hitting Enter.
The full command is always in the pane's bash history:
localhost: Rscript -e "..." (re-enters tmuxRunWorkerLoop; in
killAndNewPane mode respawn-pane takes over from the first job onward).
remote: if setup && scp; then first_run; _st=$?; while [ $_st -ne 0 ]; do sleep 2; loop_run; _st=$?; done; fi
command (restarts the sh loop from scratch; plain POSIX – works in bash, dash, and sh).
At startup, experimentTmux sets the tmux session option
default-terminal = "tmux-256color". This ensures that all subsequently
created panes advertise a full-colour ANSI terminal, which is required for
R packages such as cli and crayon to render coloured/dynamic output
correctly. Without this, connections that arrive via Windows PowerShell
-> SSH -> tmux often inherit TERM=screen or no TERM at all, causing
R to fall back to plain-text output. The setting is applied globally to the
session (-g) and persists for the session's lifetime; it does not modify
~/.tmux.conf.
experimentTmux( df, global_path = "global.R", cores = NULL, n_workers = if (is.null(cores)) 4L else length(cores), delay_after_split = 0.4, delay_after_layout = 0.4, delay_between_R_start = 0, delay_before_source = 60, stagger_by = delay_before_source, set_mouse = TRUE, statusCalculate = getOption("spades.statusCalculate"), folderWithIterInFilename = getOption("spades.folderWithIterInFilename"), activeRunningPath = getOption("spades.activeRunningPath"), continue = TRUE, queue_path = NULL, on_interrupt = c("requeue", "fail"), pane_mode = c("killAndNewPane", "reuse"), ss_id = NULL, forceLocalQueueToGS = FALSE, enableGSSync = FALSE, email = getOption("gargle_oauth_email"), cache_path = getOption("gargle_oauth_cache"), workersToMonitor = unique(if (is.null(cores)) "localhost" else cores), runNameLabel = quote(colnames(q)[1:2]), copyModules = FALSE, ... )experimentTmux( df, global_path = "global.R", cores = NULL, n_workers = if (is.null(cores)) 4L else length(cores), delay_after_split = 0.4, delay_after_layout = 0.4, delay_between_R_start = 0, delay_before_source = 60, stagger_by = delay_before_source, set_mouse = TRUE, statusCalculate = getOption("spades.statusCalculate"), folderWithIterInFilename = getOption("spades.folderWithIterInFilename"), activeRunningPath = getOption("spades.activeRunningPath"), continue = TRUE, queue_path = NULL, on_interrupt = c("requeue", "fail"), pane_mode = c("killAndNewPane", "reuse"), ss_id = NULL, forceLocalQueueToGS = FALSE, enableGSSync = FALSE, email = getOption("gargle_oauth_email"), cache_path = getOption("gargle_oauth_cache"), workersToMonitor = unique(if (is.null(cores)) "localhost" else cores), runNameLabel = quote(colnames(q)[1:2]), copyModules = FALSE, ... )
df |
A |
global_path |
Character scalar. Absolute path to the script sourced for each job. |
cores |
Character vector of machine hostnames, recycled to |
n_workers |
Integer. Number of worker panes to spawn. Defaults to |
delay_after_split |
Numeric. Seconds to wait after each |
delay_after_layout |
Numeric. Seconds to wait after |
delay_between_R_start |
Numeric. Seconds to wait after starting R in each pane.
Default |
delay_before_source |
Numeric. Seconds panes 2..n wait before claiming their first
job. Default |
stagger_by |
Numeric. Additional seconds per pane beyond pane 2:
pane |
set_mouse |
Logical. Enable tmux mouse support (pane selection, scroll). Default |
statusCalculate |
A quoted expression (optionally using |
folderWithIterInFilename |
A quoted expression (optionally using |
activeRunningPath |
Directory for "running" flag files written while a job is
active. Must be cleaned up manually if a job crashes without removing its flag.
Default: |
continue |
Logical. Reserved for future single-shot mode; currently ignored. |
queue_path |
Character. Path to the |
on_interrupt |
|
pane_mode |
|
ss_id |
Optional Google Drive spreadsheet/folder ID for live status syncing via
|
forceLocalQueueToGS |
Logical. If |
enableGSSync |
Logical. If |
email |
Optional email address for gargle/Google OAuth authentication. |
cache_path |
Optional path to the gargle OAuth token cache directory. |
workersToMonitor |
Character vector of pane titles to monitor (currently unused). |
runNameLabel |
A quoted expression evaluated against the queue |
copyModules |
Logical. If |
... |
Additional arguments passed to |
Invisibly returns a character vector of tmux pane IDs for the spawned workers.
Pass these to tmuxKillPanes() to tear down all workers at once.
| Function | Purpose |
tmuxPrepareQueueFromDF() |
Build a file-backed queue RDS from a data.frame of runs |
tmuxRunNextWorker() |
Claim and run one queued job in the current R session |
tmuxRunWorkerLoop() |
Loop of tmuxRunNextWorker() inside a worker pane |
tmuxRefreshQueueStatus() |
Re-evaluate job status from output files and heartbeats |
tmuxMirrorQueueToSheets() |
Mirror a local queue RDS to a Google Sheet |
tmuxListPanes() |
List every pane across every tmux server on this machine |
tmuxFindDuplicates() |
Surface panes running the same job (duplicate claims) |
tmuxSetPaneTitle() |
Rewrite a pane's title by matching its current title |
tmuxKillPanes() |
Kill a set of panes by ID (tear-down) |
tmuxSetMouse() |
Enable or disable tmux mouse mode |
tmuxActiveRunningPath() |
Default path for per-run "active" flag files |
localHostLabel() |
Short cluster alias for this machine (/etc/hosts lookup)
|
## Not run: # --- Minimal: build a tiny global.R, then run a 2 x 2 experiment --- tdir <- file.path(tempdir(), "experimentTmux-demo") dir.create(tdir, showWarnings = FALSE, recursive = TRUE) writeLines( 'message("scenario=", .scenario, " rep=", .rep); Sys.sleep(2)', file.path(tdir, "global.R") ) expt <- expand.grid(.scenario = c("A", "B"), .rep = 1:2, stringsAsFactors = FALSE) workers <- experimentTmux( df = expt, global_path = file.path(tdir, "global.R"), cores = rep("localhost", 2L), queue_path = file.path(tdir, "queue.rds") ) # --- Live inspection while panes run --- experimentMonitor() # tmux pane scan (no args) experimentMonitor(stats = TRUE) # adds CPU / RAM / state per pane tmuxListPanes() # alias of experimentMonitor() queueRead(file.path(tdir, "queue.rds")) # full queue snapshot tmuxFindDuplicates(workers) # any double-claimed jobs? tmuxRefreshQueueStatus(file.path(tdir, "queue.rds")) # reset stuck rows # --- Basic local usage with explicit pane sizing --- workers <- experimentTmux( global_path = "/abs/path/to/global.R", queue_path = "/abs/path/to/queue.rds", n_workers = 4, pane_mode = "killAndNewPane", delay_before_source = 60, stagger_by = 60, set_mouse = TRUE ) # --- Mixed local + remote --- # Runs 2 workers on localhost and 2 on remote host "sbw". # .setup_remote_machine("sbw", ...) is called automatically before workers start. workers <- experimentTmux( global_path = "/abs/path/to/global.R", queue_path = "/abs/path/to/queue.rds", cores = c("localhost", "localhost", "sbw", "sbw"), pane_mode = "killAndNewPane", email = "[email protected]", cache_path = "/abs/path/to/.secret", ss_id = "your-google-sheet-id" ) # --- Tear down all workers --- tmuxKillPanes(workers) # --- Restart a single broken pane --- # In the broken pane, press Up then Enter to re-run the last command. ## End(Not run)## Not run: # --- Minimal: build a tiny global.R, then run a 2 x 2 experiment --- tdir <- file.path(tempdir(), "experimentTmux-demo") dir.create(tdir, showWarnings = FALSE, recursive = TRUE) writeLines( 'message("scenario=", .scenario, " rep=", .rep); Sys.sleep(2)', file.path(tdir, "global.R") ) expt <- expand.grid(.scenario = c("A", "B"), .rep = 1:2, stringsAsFactors = FALSE) workers <- experimentTmux( df = expt, global_path = file.path(tdir, "global.R"), cores = rep("localhost", 2L), queue_path = file.path(tdir, "queue.rds") ) # --- Live inspection while panes run --- experimentMonitor() # tmux pane scan (no args) experimentMonitor(stats = TRUE) # adds CPU / RAM / state per pane tmuxListPanes() # alias of experimentMonitor() queueRead(file.path(tdir, "queue.rds")) # full queue snapshot tmuxFindDuplicates(workers) # any double-claimed jobs? tmuxRefreshQueueStatus(file.path(tdir, "queue.rds")) # reset stuck rows # --- Basic local usage with explicit pane sizing --- workers <- experimentTmux( global_path = "/abs/path/to/global.R", queue_path = "/abs/path/to/queue.rds", n_workers = 4, pane_mode = "killAndNewPane", delay_before_source = 60, stagger_by = 60, set_mouse = TRUE ) # --- Mixed local + remote --- # Runs 2 workers on localhost and 2 on remote host "sbw". # .setup_remote_machine("sbw", ...) is called automatically before workers start. workers <- experimentTmux( global_path = "/abs/path/to/global.R", queue_path = "/abs/path/to/queue.rds", cores = c("localhost", "localhost", "sbw", "sbw"), pane_mode = "killAndNewPane", email = "[email protected]", cache_path = "/abs/path/to/.secret", ss_id = "your-google-sheet-id" ) # --- Tear down all workers --- tmuxKillPanes(workers) # --- Restart a single broken pane --- # In the broken pane, press Up then Enter to re-run the last command. ## End(Not run)
Extracts the "all meaningful combinations" factorial-design logic that used to
live inside experiment(). Given a simList plus lists of alternative
params / modules / inputs / objects, it returns one row per run.
Values are stored as indices into the supplied alternatives (because an
alternative may itself be a vector and so cannot live in a single data.frame
cell); column names are module.parameter, plus a modules index, an
expLevel, and (when relevant) input, object and replicate columns.
factorialDesign(sim, params, modules, objects = list(), inputs, replicates = 1)factorialDesign(sim, params, modules, objects = list(), inputs, replicates = 1)
sim |
A |
params |
Like for |
modules |
Like for |
objects |
Like for |
inputs |
Like for |
replicates |
The number of replicates to run of the same |
This is the engine behind experiment(). It is exported so the same design
can also seed the file-queue experiment_family (experimentFuture() etc.):
map each row's indices back to values to build their df.
A data.frame, one row per run.
experiment(), experiment_family
select() ceiling).Hard-coded to 1024 – the value compiled into glibc and used by R's socket layer on Linux, macOS, and most BSDs. Not user-configurable without rebuilding R.
fdSelectLimit()fdSelectLimit()
integer scalar.
Searches from current working directory for and Rstudio project file or git repository, falling back on using the current working directory.
findProjectPath() findProjectName()findProjectPath() findProjectName()
findProjectPath returns an absolute path;
findProjectName returns the basename of the path.
Scans an output directory for files matching the pattern
<file_prefix>_year<XXXX>.<ext> (e.g. cohortData_year2920.rds) and
returns the furthest simulation year reached, the wall-clock elapsed time
since the first checkpoint, and a percentage-complete estimate.
get_sim_year_heartbeat( output_path, start_year = NULL, end_year = NULL, file_prefix = "cohortData" )get_sim_year_heartbeat( output_path, start_year = NULL, end_year = NULL, file_prefix = "cohortData" )
output_path |
Character. Directory to scan for checkpoint files. |
start_year |
Integer or |
end_year |
Integer or |
file_prefix |
Character. Only files whose basename begins with this
prefix are used as checkpoint indicators. Defaults to |
A named list with elements:
tsCharacter. Modification timestamp of the latest checkpoint file.
iterInteger. Simulation year of the latest checkpoint.
startedCharacter. Modification timestamp of the first checkpoint file.
elapseddifftime. Wall-clock time between first and latest checkpoint.
pct_completeNumeric 0-100. Percentage of the simulation completed,
or NA if start_year == end_year.
All elements are NA / NA_character_ when no matching files are found.
## Not run: hb <- get_sim_year_heartbeat( output_path = "outputs/6.5/1991-2020/NRV_ssp370/rep1", end_year = 3020L ) message("Year: ", hb$iter, " (", hb$pct_complete, "%) -- last checkpoint: ", hb$ts) ## End(Not run)## Not run: hb <- get_sim_year_heartbeat( output_path = "outputs/6.5/1991-2020/NRV_ssp370/rep1", end_year = 3020L ) message("Year: ", hb$iter, " (", hb$pct_complete, "%) -- last checkpoint: ", hb$ts) ## End(Not run)
This can be used within e.g., the options or params arguments for
setupProject to get a ready-made file for a project.
getGithubFile( gitRepoFile, overwrite = FALSE, destDir = ".", verbose = getOption("Require.verbose") )getGithubFile( gitRepoFile, overwrite = FALSE, destDir = ".", verbose = getOption("Require.verbose") )
gitRepoFile |
Character string that follows the convention
GitAccount/GitRepo@Branch/File, if @Branch is omitted, then it will be
assumed to be |
overwrite |
A logical vector of same length (or length 1) |
destDir |
A directory to put the file that is to be downloaded. |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
filename <- getGithubFile("PredictiveEcology/LandWeb@development/01b-options.R", destDir = Require::tempdir2())filename <- getGithubFile("PredictiveEcology/LandWeb@development/01b-options.R", destDir = Require::tempdir2())
Simple function to download a SpaDES module as GitHub repository
getModule( modules, modulePath, overwrite = FALSE, verbose = getOption("Require.verbose", 1L) )getModule( modules, modulePath, overwrite = FALSE, verbose = getOption("Require.verbose", 1L) )
modules |
Character vector of one or more github repositories as character strings that contain
SpaDES modules. These should be presented in the standard R way, with
|
modulePath |
A local path in which to place the full module, within
a subfolder ... i.e., the source code will be downloaded to here:
|
overwrite |
A logical vector of same length (or length 1) |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
simLists objectGiven the name or the definition of a class, plus optionally data to be
included in the object, new returns an object from that class.
## S4 method for signature 'simLists' initialize(.Object, ...)## S4 method for signature 'simLists' initialize(.Object, ...)
.Object |
A |
... |
Optional Values passed to any or all slot |
Two modes are available:
Graceful (force = FALSE, default): creates a per-worker
sentinel file. Each worker checks for this file between jobs and exits
cleanly once its current job finishes. Remaining PENDING jobs stay
in the queue and are picked up automatically when
experimentFuture is called again with the same
queue_path.
Immediate (force = TRUE): sends SIGTERM to each live
worker, causing the process to exit as soon as possible. Because callr
workers run non-interactively, the process typically exits before R's
interrupt handler has a chance to update the queue. Any jobs that were
RUNNING at the time of the kill will remain as RUNNING in the
queue until the next reclaim pass. Call
tmuxRefreshQueueStatus(ef$queue_path) afterwards to reset stale
RUNNING entries to INTERRUPTED, or use the GS backend which
reclaims dead workers automatically before each new claim.
killExperimentFuture(ef, force = FALSE)killExperimentFuture(ef, force = FALSE)
ef |
An |
force |
If |
ef, invisibly.
experimentFuture, awaitExperimentFuture
Graceful (force = FALSE, default): creates per-worker stop
files; each worker exits cleanly between jobs once it observes its file.
Slurm jobs end normally; remaining PENDING rows stay in the queue
and can be resumed with another experimentSBATCH call against
the same queue_path.
killExperimentSBATCH(es, force = FALSE, scancel_cmd = "scancel")killExperimentSBATCH(es, force = FALSE, scancel_cmd = "scancel")
es |
An |
force |
|
scancel_cmd |
Path to |
Immediate (force = TRUE): runs scancel <ids> to kill
the Slurm jobs straight away. Any rows that were RUNNING at the
time of cancellation will remain RUNNING in the queue until the
next reclaim pass; clean them up with
tmuxRefreshQueueStatus(es$queue_path).
es, invisibly.
experimentSBATCH, awaitExperimentSBATCH
Worker panes started by experimentTmux() / tmuxRunNextWorker() evaluate
the user's global.R inside a fresh scenario environment (not .GlobalEnv).
When the source call errors, the package captures sys.calls() and stashes
it on that scenario env so a post-mortem traceback is still possible without
polluting the user's global state. Use this accessor to retrieve it.
lastTraceback()lastTraceback()
A list of calls (as from sys.calls()) suitable for passing to
base::traceback(); NULL if no error has been captured in the current
session.
## Not run: # After a worker pane errors: traceback(SpaDES.project::lastTraceback()) ## End(Not run)## Not run: # After a worker pane errors: traceback(SpaDES.project::lastTraceback()) ## End(Not run)
When exploring existing modules, these tools help identify and navigate modules and their interdependencies.
listModules( keywords, accounts, includeForks = FALSE, includeArchived = FALSE, excludeStale = TRUE, omit = c("fireSense_dataPrepFitRas"), purge = FALSE, returnList = FALSE, verbose = getOption("Require.verbose", 1L) ) moduleDependencies( modules, modulePath = getOption("reproducible.modulePath", ".") ) moduleDependenciesToGraph(md) PlotModuleGraph(graph)listModules( keywords, accounts, includeForks = FALSE, includeArchived = FALSE, excludeStale = TRUE, omit = c("fireSense_dataPrepFitRas"), purge = FALSE, returnList = FALSE, verbose = getOption("Require.verbose", 1L) ) moduleDependencies( modules, modulePath = getOption("reproducible.modulePath", ".") ) moduleDependenciesToGraph(md) PlotModuleGraph(graph)
keywords |
A vector of character strings that will be used as keywords for identify modules |
accounts |
A vector of character strings identifying GitHub accounts e.g.,
|
includeForks |
Should the returned list include repositories that are forks
(i.e., not the original repository). Default is |
includeArchived |
Should the returned list include repositories that are archived
(i.e., developer has retired them). Default is |
excludeStale |
Logical or date. If |
omit |
A vector of character strings of repositories to ignore. |
purge |
There is some internal caching that occurs. Setting this to |
returnList |
Should the function return a named list where the name is the |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
modules |
Either a character vector of local module names, or a named list
of character strings of short module names (i.e., the folder paths in |
modulePath |
A character string indicating the path where the modules are located. |
md |
A data.table with columns |
graph |
An igraph object to plot. Likely returned by |
listModules returns a character vector of paste0(account, "/", Repository) for
all SpaDES modules in the given repositories with
the accounts and keywords provided.
metadataInModules() helps to see different metadata elements in a folder of modules.
listModules(accounts = "PredictiveEcology", "none")listModules(accounts = "PredictiveEcology", "none")
Resolves the cluster-facing short name for this host, trying in order:
/etc/hosts lookup by a local IP (shortest alias wins);
~/.ssh/config Host entry whose Hostname is a local IP (CRLF-safe);
hostname -s.
localHostLabel()localHostLabel()
Useful for deriving the pane-title host prefix when the cluster knows this
machine by a name different from hostname -s (e.g. mega, whose raw
hostname is the node id but whose cluster alias is mega via /etc/hosts).
Character(1) short name, or NULL if none could be determined.
Make DESCRIPTION file(s) from SpaDES module metadata
makeDESCRIPTIONproject( modules, modulePath, projectPath = ".", singleDESCRIPTION = TRUE, package = "Project", title = "Project", description = "Project", version = "1.0.0", authors = Sys.info()["user"], write = TRUE, verbose = getOption("Require.verbose") ) makeDESCRIPTION( modules, modulePath, projectPath = ".", singleDESCRIPTION = FALSE, package, title, date, description, version, authors, write = TRUE, verbose, metadataList, ... )makeDESCRIPTIONproject( modules, modulePath, projectPath = ".", singleDESCRIPTION = TRUE, package = "Project", title = "Project", description = "Project", version = "1.0.0", authors = Sys.info()["user"], write = TRUE, verbose = getOption("Require.verbose") ) makeDESCRIPTION( modules, modulePath, projectPath = ".", singleDESCRIPTION = FALSE, package, title, date, description, version, authors, write = TRUE, verbose, metadataList, ... )
modules |
A character vector of module names |
modulePath |
Character. The path with modules, usually |
projectPath |
Character. Only used if |
singleDESCRIPTION |
Logical. If |
package |
The name inserted into the "Package" entry in DESCRIPTION |
title |
The string inserted into the "Title" entry in DESCRIPTION |
description |
The string inserted into the "Description" entry in DESCRIPTION |
version |
The string inserted into the "Version" entry in DESCRIPTION |
authors |
The string inserted into the "Authors" entry in DESCRIPTION |
write |
Logical. If |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
date |
Date to enter into DESCRIPTION file. Defaults to |
metadataList |
The parsed source code from a module. Must include |
... |
Currently not used. |
On Linux, reads /proc/self/fd and resolves each symlink to identify what
the fd points to. Returns a data.frame with one row per open fd; useful
for diagnosing the cryptic "file descriptor is too large for select()"
failure from parallelly::makeClusterPSOCK().
openFds()openFds()
Returns an empty data.frame on non-Linux systems or if /proc/self/fd is
unreadable.
data.frame with columns fd (integer), target (character, with
any Linux (deleted) suffix preserved), and bucket (character holder
category: "socket", "pipe", "terra scratch",
"terra scratch (deleted)", "tif (other)", "vrt", "sqlite",
"qs/qs2", "anon_inode", "other file", "unknown").
openFdsReport() for a printable summary; fdSelectLimit().
Wraps openFds() and returns a multi-line string suitable for warning or
error messages. By default summarizes only fds at or above
fdSelectLimit() (the select() failure threshold); pass threshold = 0L
to summarize everything.
openFdsReport(threshold = fdSelectLimit())openFdsReport(threshold = fdSelectLimit())
threshold |
integer. Only fds at or above this number are bucketed.
Defaults to |
character scalar; "" if /proc/self/fd is not available.
cat(openFdsReport()) cat(openFdsReport(threshold = 0L))cat(openFdsReport()) cat(openFdsReport(threshold = 0L))
List uploaded scenario output archives.
outList(folder, pattern = "\\.tar\\.gz$")outList(folder, pattern = "\\.tar\\.gz$")
folder |
Folder URL or |
pattern |
Regex matched against the |
A dribble of the matching files.
outScenarios(), queueUploadMissing()
Saves a simList to an RDS file via SpaDES.core::saveSimList().
Heavy ancillary data (inputs, outputs, cache, files) are excluded so the
file contains only the simulation state; pair with outTar() to bundle the
output files separately.
outSave(sim, runName, simFilename = NULL, lazy = TRUE)outSave(sim, runName, simFilename = NULL, lazy = TRUE)
sim |
A |
runName |
Character scalar. Used as the base name for the saved sim file and tarball. |
simFilename |
Character scalar. Full path for the |
lazy |
Logical. Passed to |
Invisibly returns simFilename.
outTar(), outUpload(), outSaveTarUpload()
Convenience wrapper that calls outSave(), outTar(), and outUpload()
in sequence. The sim is saved to an RDS file, bundled with its output
files into a .tar.gz archive, and the archive is uploaded to a Google
Drive folder.
outSaveTarUpload( runName, sim, gFolder = NULL, simFilename = NULL, tarDir = NULL, tarball = NULL, overwrite = TRUE, cleanup = FALSE, verbose = TRUE, lazy = TRUE )outSaveTarUpload( runName, sim, gFolder = NULL, simFilename = NULL, tarDir = NULL, tarball = NULL, overwrite = TRUE, cleanup = FALSE, verbose = TRUE, lazy = TRUE )
runName |
Character scalar. Used as the base name for the saved sim file and tarball. |
sim |
A |
gFolder |
A Google Drive folder identifier accepted by
|
simFilename |
Character scalar. Full path for the |
tarDir |
Character scalar. Directory in which to create the tarball.
Defaults to |
tarball |
Character scalar. Path to the local file to upload. |
overwrite |
Logical. Overwrite an existing file of the same name in
the Drive folder. Default |
cleanup |
Logical. Delete the local tarball after a successful upload.
Default |
verbose |
Logical. Pass |
lazy |
Logical. Passed to |
Invisibly returns the dribble from googledrive::drive_upload().
outSave(), outTar(), outUpload()
Uploaded outputs as scenario records.
outScenarios(folder, pattern = "\\.tar\\.gz$")outScenarios(folder, pattern = "\\.tar\\.gz$")
folder |
Folder URL or |
pattern |
Regex matched against the |
A list of scenario objects.
Creates a .tar.gz archive containing simFilename and any additional
outputFiles. Files that do not exist are silently skipped so a partially
completed simulation can still be archived.
outTar( simFilename, outputFiles = character(0), runName, tarDir = dirname(simFilename), verbose = TRUE )outTar( simFilename, outputFiles = character(0), runName, tarDir = dirname(simFilename), verbose = TRUE )
simFilename |
Character scalar. Full path for the |
outputFiles |
Character vector of additional files to include (e.g.
|
runName |
Character scalar. Used as the base name for the saved sim file and tarball. |
tarDir |
Character scalar. Directory in which to create the tarball.
Defaults to |
verbose |
Logical. Pass |
Invisibly returns the path to the created tarball.
outSave(), outUpload(), outSaveTarUpload()
Uploads a local file (typically a tarball produced by outTar()) to a
Google Drive folder via googledrive::drive_upload().
outUpload(tarball, gFolder, overwrite = TRUE, cleanup = FALSE)outUpload(tarball, gFolder, overwrite = TRUE, cleanup = FALSE)
tarball |
Character scalar. Path to the local file to upload. |
gFolder |
A Google Drive folder identifier accepted by
|
overwrite |
Logical. Overwrite an existing file of the same name in
the Drive folder. Default |
cleanup |
Logical. Delete the local tarball after a successful upload.
Default |
Invisibly returns the dribble returned by
googledrive::drive_upload().
outSave(), outTar(), outSaveTarUpload()
Parses module code, looking for the metadataItem (default = "reqdPkgs")
element in the defineModule function.
packagesInModules(modules, modulePath = getOption("spades.modulePath")) metadataInModules( modules, metadataItem = "reqdPkgs", modulePath = getOption("spades.modulePath"), needUnlist, verbose = getOption("Require.verbose", 1L) )packagesInModules(modules, modulePath = getOption("spades.modulePath")) metadataInModules( modules, metadataItem = "reqdPkgs", modulePath = getOption("spades.modulePath"), needUnlist, verbose = getOption("Require.verbose", 1L) )
modules |
character vector of module names |
modulePath |
path to directory containing the module(s) named in |
metadataItem |
character identifying the metadata field to extract |
needUnlist |
logical indicating whether to |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
A character vector of sorted, unique packages that are identified in all named
modules, or if modules is omitted, then all modules in modulePath.
Generic format: each non-empty field's value becomes one path segment.
Field order is taken from the order of the input (or, for positional
calls, from scenarioFields()). Integer-and-contiguous vectors are
encoded as start-end. Empty / NA fields are dropped entirely
(yielding one fewer segment); for round-tripping see pathParse().
pathBuild(..., pre = "outputs", withFieldLabel = .scenario_env$withFieldLabel)pathBuild(..., pre = "outputs", withFieldLabel = .scenario_env$withFieldLabel)
... |
Either a single named-list / scenario, or named/positional field/value pairs. |
pre |
Path prefix (default |
withFieldLabel |
Character vector of field names whose segment
should carry a |
Fields whose name appears in withFieldLabel get their segment
prefixed by the field name itself (e.g. .rep with value 5
renders as .rep5 instead of bare 5). Useful when path readers
must distinguish two integer fields, or when round-tripping with
mid-list NAs (the label disambiguates which segments are present).
Accepts three calling styles, all equivalent:
pathBuild(scenarioObj) — a single named-list / scenario;
pathBuild(.fieldA = vA, .fieldB = vB, ...) — explicit named args;
pathBuild(vA, vB, ...) — positional, in cached-field order.
Character scalar.
Inverse of the default pathBuild(): splits the path on / (or, for
tarname inputs, on _), strips archive extensions and the pre
prefix, then matches segments positionally to scenarioFields().
Integer ranges of the form start-end decode to integer vectors.
pathParse( path, fields = scenarioFields(), pre = "outputs", withFieldLabel = .scenario_env$withFieldLabel )pathParse( path, fields = scenarioFields(), pre = "outputs", withFieldLabel = .scenario_env$withFieldLabel )
path |
A single character string (path or tarname). |
fields |
Field labels in scenario order; defaults to
|
pre |
Path prefix to strip (default |
withFieldLabel |
Character vector of field names that were
built with |
Without per-segment labels there is no way to recover which field a
missing segment corresponds to, so when the path has fewer segments
than there are fields, the trailing fields are treated as NA.
Round-trip is therefore only lossless when NA-bearing fields are
last in the field order unless you label the ambiguous fields
through withFieldLabel: any field named there has its label
prefix stripped from the segment, and segments not starting with a
labeled field's name are assigned positionally to the next
unlabeled field. With every potentially-NA field labeled, mid-list
NAs round-trip cleanly.
Named list of field values (in fields order).
pkgload::load_all with cachingpkgload::load_all does not automatically deal with dependency chains: the
user must manually load the dependency chain in order with separate calls to
pkgload::load_all. Also, it does not use caching. This function allows
nested caching for a sequence of packages that depend on one another. For
example, if a user has 3 packages that have dependency chain:
A is a dependency of B which is a dependency of C.
If a change happens in C, then pkgload::load_all will only be called on C.
If a change happens in A, then pkgload::load_all will be called on A, then B, then C.
pkgload2( depsPaths = file.path("~/GitHub", c("reproducible", "SpaDES.core", "LandR")), envir = parent.frame() )pkgload2( depsPaths = file.path("~/GitHub", c("reproducible", "SpaDES.core", "LandR")), envir = parent.frame() )
depsPaths |
A character vector of paths to packages that need loading, or list of these. Each vector should be the load order sequence, based on the package dependencies, i.e., the first element in the vector should be a dependency of the second element in the vector etc. For packages that do not depend on each other, use separate list elements. |
envir |
An environment where an object called .prevDigs that will be placed and used as a cache comparison. |
This is called for its side effects, which are 2: pkgload::load_all on the
packages that need it, and an object, .prevDigs that is assigned to envir.
Plot all studyArea** and rasterToMatch** objects within a list-like object.
plotSAs( ll, ..., include = TRUE, exclude, saCols = c("purple", "blue", "green", "red"), title, rasterToMatchLabel = "Stand Age", rasterToMatchPalette = c("Set1", "Set2", "Set3"), country = "CAN", latlong = FALSE, minArea = 7e+11 ) plotSAsLeaflet( ll, ..., include = TRUE, exclude, saCols = c("purple", "blue", "green", "red"), title = "Study Areas", rasterToMatchLabel = "Stand Age", rasterToMatchPalette = c("Set1", "Set2", "Set3") )plotSAs( ll, ..., include = TRUE, exclude, saCols = c("purple", "blue", "green", "red"), title, rasterToMatchLabel = "Stand Age", rasterToMatchPalette = c("Set1", "Set2", "Set3"), country = "CAN", latlong = FALSE, minArea = 7e+11 ) plotSAsLeaflet( ll, ..., include = TRUE, exclude, saCols = c("purple", "blue", "green", "red"), title = "Study Areas", rasterToMatchLabel = "Stand Age", rasterToMatchPalette = c("Set1", "Set2", "Set3") )
ll |
Any list-like object with named elements. Names must include at least
one that starts with |
... |
Any objects to plot. Currently, they must be named arguments, and they must
have prefixes |
include |
Either logical or a character vector. If logical, this indicates whether all maps in the |
exclude |
A character vector of spatial objects contained within |
saCols |
A vector of same length as number of |
title |
The main title for the ggplot2 object. Defaults to one or both of "studyArea" and "rasterToMatch" or their plurals. |
rasterToMatchLabel |
Used in rasterToMatch legend |
rasterToMatchPalette |
A palette to be used for colour scheme in rasterToMatch plotting.
Can be any that work with |
country |
The country for jurisdiction boundaries; defaults to "CAN". Passed to
|
latlong |
Logical. Should all layers be converted to |
minArea |
In m^2. This is the minimium area for the entire plot. If this is too
small then the legislative boundaries may not appear. The area covered by the plot
will the maximum of the studyArea** or rasterToMatch** and this |
Run primarily for side effects. plotSAs plots (and returns) a ggplot2 object.
plotSAsLeaflet creates a leaflet page in a viewer (if using Rstudio).
setupProject
preRunSetupProject parses an R script (default: "global.R") and
evaluates its contents up to the setupProject() call, either fully or
partially based on the upTo argument. This is useful for initializing
only certain parts of a project without executing the entire setup.
preRunSetupProject(file = "global.R", upTo = TRUE, envir = parent.frame())preRunSetupProject(file = "global.R", upTo = TRUE, envir = parent.frame())
file |
Character string. Path to the R script containing the setup code.
Defaults to |
upTo |
Character or logical. If |
envir |
The environment where the function should be finding objects. Defaults
to |
The function:
Parses the specified file using parse().
Identifies the line where setupProject() is called.
Evaluates all code before the setupProject() call.
Depending on upTo, evaluates either the full call or a subset
of its arguments.
This allows selective initialization of project components for debugging or partial setup in large projects.
The evaluated result of the executed portion of setupProject().
i.e., a list returned by setupProject().
## Not run: # Run file up to and including the setupProject, but only to the 'paths' argument result <- preRunSetupProject(file = "global.R", upTo = "paths") # Run file up to and including full setupProject() result <- preRunSetupProject(file = "global.R", upTo = TRUE) ## End(Not run)## Not run: # Run file up to and including the setupProject, but only to the 'paths' argument result <- preRunSetupProject(file = "global.R", upTo = "paths") # Run file up to and including full setupProject() result <- preRunSetupProject(file = "global.R", upTo = TRUE) ## End(Not run)
Two call shapes:
queueRead(folder, name, sheet = NULL, col_types = "c")queueRead(folder, name, sheet = NULL, col_types = "c")
folder |
Either a local path to an |
name |
Spreadsheet name (exact match) within |
sheet |
Optional worksheet/tab name (passed to |
col_types |
Column-types spec for |
queueRead("path/to/queue.rds")
When the first
argument is an existing local .rds file and name is not
supplied, the queue is loaded via readRDS(). Useful for the
file-backed queues written by experimentTmux() /
experimentFuture() / experimentSBATCH() when no ss_id was
supplied.
queueRead(folder, name)
Convenience wrapper
around googledrive::drive_ls() + googlesheets4::read_sheet().
folder is the Drive folder URL/id, name is the spreadsheet
name within it.
Either way the result is passed through revertDotNames() so callers
see canonical .ELFind/.GCM/... column names rather than the
dotELFind/dotGCM/... names Google Sheets forces. As a side
effect, the non-meta column names are cached as the active scenario
field set (see scenarioFields()).
A data.table. Pipe through as_scenario() for scenario records.
queueUploadMissing(), outList(), outScenarios(),
experimentFuture(), experimentTmux(), experimentSBATCH()
Anti-join of the driver queue against the upload folder's .tar.gz
listing, keyed on rendered tarname (see as_tarname()). Independent
of the queue's status column.
queueUploadMissing(folder, name, uploadFolder, ...)queueUploadMissing(folder, name, uploadFolder, ...)
folder |
Folder URL of the queue (driver) Drive folder. |
name |
Queue spreadsheet name within |
uploadFolder |
Folder URL of the upload Drive folder. |
... |
Extra args forwarded to |
Subset of the queue data.table for rows whose expected tarball
is not present in uploadFolder.
Inverse of outUpload(). Downloads one or more tar.gz archives from a
Google Drive folder to a local directory, using
reproducible::preProcess() (so re-runs hit the local copy when present).
Vectorised: typically called with the multi-row dribble returned by
outList() / outScenarios().
reGet(gFiles, destDir, overwrite = FALSE, verbose = TRUE)reGet(gFiles, destDir, overwrite = FALSE, verbose = TRUE)
gFiles |
Either a Google Drive |
destDir |
Character scalar. Local directory to write tarballs into. Created if it does not exist. |
overwrite |
Logical. Force re-download even if the local file
exists. Default |
verbose |
Logical. Print elapsed time per download. Default |
A data.table with columns name and local_path, one row per
downloaded file.
reUntar(), reLoad(), reGetUntarLoad(), outUpload()
Convenience wrapper around reGet(), reUntar(), and reLoad() – the
inverse of outSaveTarUpload(). Operates on a batch: typically called
with the multi-row dribble returned by outList() / outScenarios().
reGetUntarLoad( gFiles, destDir, pathRemap = NULL, projectPath = getwd(), method = c("loadSimList", "readRDS"), overwrite = FALSE, verbose = TRUE )reGetUntarLoad( gFiles, destDir, pathRemap = NULL, projectPath = getwd(), method = c("loadSimList", "readRDS"), overwrite = FALSE, verbose = TRUE )
gFiles |
Either a Google Drive |
destDir |
Character scalar. Local directory to write tarballs into. Created if it does not exist. |
pathRemap |
Optional named character vector of length 2,
|
projectPath |
Character scalar. Passed to
|
method |
One of |
overwrite |
Logical. Force re-download even if the local file
exists. Default |
verbose |
Logical. Print elapsed time per download. Default |
A named list of simList objects, one per row of gFiles,
named by the archive's name (sans .tar.gz).
reGet(), reUntar(), reLoad(), outSaveTarUpload()
Pass a function (or, for withFieldLabel, a character vector) to
register it; pass NULL explicitly to clear that slot; omit the
argument to leave it untouched. Call with no arguments to inspect.
register_scenario_format(build, parse, withFieldLabel)register_scenario_format(build, parse, withFieldLabel)
build |
Function (custom path builder), or |
parse |
Function (custom path parser), or |
withFieldLabel |
Either:
|
Lookup precedence (highest first): registered slot ->
pathBuild/pathParse defined in the global environment -> the
package defaults.
Override signature contract:
build(..., pre = "outputs") — receives the scenario as named
... args (one per field) plus pre; returns a path string.
parse(path, pre = "outputs") — returns a named list of fields.
Invisibly, the current (build, parse, withFieldLabel) triple.
Inverse of outSave(). Loads one or more simLists from .rds files
produced by outSave(). Defaults to SpaDES.core::loadSimList();
set method = "readRDS" to bypass .unwrap entirely.
Note that SpaDES.core::saveSimList() uses .wrapResiliently to NULL
out file-backed objects with inaccessible backing files at save time.
Load-time failures (e.g. backing files missing on this machine even
though they were present at save time) are independent of that, and are
handled by loadSimList's pre-.unwrap resilient pass.
reLoad( simFilenames, projectPath = getwd(), method = c("loadSimList", "readRDS"), ... )reLoad( simFilenames, projectPath = getwd(), method = c("loadSimList", "readRDS"), ... )
simFilenames |
Character vector of paths to |
projectPath |
Character scalar. Passed to
|
method |
One of |
... |
Additional args forwarded to |
A list of simList objects, named by basename(simFilenames).
reGet(), reUntar(), reGetUntarLoad(), outSave()
Inverse of outTar(). Extracts one or more .tar.gz archives produced
by outTar() / outSaveTarUpload(), which contain absolute paths. If
pathRemap is supplied, the leading path prefix is rewritten on
extraction (handy when the archive was created on another user's
machine, e.g. paths starting with /home/emcintir/...).
Path rewriting uses GNU tar's --transform. On systems without GNU tar,
supply pathRemap = NULL and the archive's absolute paths are restored
as-is.
reUntar(tarballs, pathRemap = NULL, verbose = FALSE)reUntar(tarballs, pathRemap = NULL, verbose = FALSE)
tarballs |
Character vector of paths to local tarballs. |
pathRemap |
Optional named character vector of length 2,
|
verbose |
Logical. Pass |
A character vector (same length as tarballs) of absolute paths
to the .rds simList file inside each archive (after any remap),
suitable for reLoad().
reGet(), reLoad(), reGetUntarLoad(), outTar()
A thin wrapper around tmuxRunWorkerLoop that optionally redirects
console output to a log file before entering the job loop. Used internally
by experimentFuture for remote (cluster) workers. Local
workers use callr::r_bg() and do not need this wrapper.
runWorkerLoopFuture( queue_path, global_path, on_interrupt = c("requeue", "fail"), ss_id = NULL, email = NULL, cache_path = NULL, runNameLabel = quote(colnames(q)[1:2]), activeRunningPath = NULL, dots_path = NULL, stop_file = NULL, log_file = NULL )runWorkerLoopFuture( queue_path, global_path, on_interrupt = c("requeue", "fail"), ss_id = NULL, email = NULL, cache_path = NULL, runNameLabel = quote(colnames(q)[1:2]), activeRunningPath = NULL, dots_path = NULL, stop_file = NULL, log_file = NULL )
queue_path |
Path to the local RDS queue file. |
global_path |
Path to the R script sourced for each job. |
on_interrupt |
|
ss_id |
Google Sheets ID for the shared queue (or |
email |
Gargle OAuth e-mail. |
cache_path |
Gargle OAuth cache directory. |
runNameLabel |
Quoted expression for deriving a run name. |
activeRunningPath |
Directory for |
dots_path |
Path to an RDS file whose contents are loaded into
|
stop_file |
Path to a sentinel file. When this file is created (e.g.
by |
log_file |
Path to the log file for this worker. If |
Invisibly returns the worker identifier string.
Accepts any named arguments. Each name becomes a field label; each value the field's value. No specific field set is required by the package (fields are project-defined via the queue).
scenario(...)scenario(...)
... |
Named field/value pairs. |
Light coercion: a single character of the form "a:b" is evaled
as an R expression (so queue cells like "1991:2020" become integer
vectors).
An S3 object of class "scenario".
A "scenario" identifies a single simulation run. Field names and values are discovered from the driver queue (Google Sheet); they are not hardcoded in this package. The same run can be referred to in three interchangeable ways:
Field values (one column per field in the queue), e.g.
(.ELFind = "6.3.1", .samplingRange = 2071:2100, ...).
An output directory path under outputs/.
An upload tar filename (path with / -> _ and .tar.gz suffix).
This file defines:
a canonical record (S3 class "scenario");
the generic as_scenario() for coercing any representation into it;
formatters as_path() / as_tarname() for going back;
default builders pathBuild() / pathParse(): each non-empty
field's value (no label) is one path segment, joined by /,
in the order given by scenarioFields(). Integer-and-contiguous
vectors render as start-end. Empty / NA fields are skipped
entirely (one fewer segment); see pathParse() for the
trailing-NA round-trip caveat.
Per-project format overrides: define your own pathBuild (and
matching pathParse) in the global environment, or register them
explicitly with register_scenario_format(). Lookup order, highest
first: register_scenario_format slot -> a pathBuild/pathParse
in the global environment -> the package default.
Field discovery: queueRead() caches the queue's non-meta column
names as the active field set. Subsequent pathParse() calls use
those labels for positional decoding. If you parse paths without
first reading a queue, pass fields = c(...) explicitly (or call
scenarioFieldsSet()).
## Not run: ## --- Default (generic) format ----------------------------------------- queue <- queueRead(folder = ss_id, name = "longRuns") # -> data.table with columns .ELFind, .samplingRange, .GCM, .SSP, .rep # plus meta columns (status, started_at, ...). Non-meta columns are # auto-cached as scenarioFields(). scens <- as_scenario(queue) # list of `scenario` objects as_path(scens[[1]]) #> "outputs/6.3.1/2071-2100/CNRM-ESM2-1/370/5" as_tarname(scens[[1]]) #> "6.3.1_2071-2100_CNRM-ESM2-1_370_5.tar.gz" # Round-trip s2 <- as_scenario("outputs/6.3.1/2071-2100/CNRM-ESM2-1/370/5") identical(unclass(scens[[1]]), unclass(s2)) # TRUE # Cross-reference queue against uploaded tarballs uploads <- outScenarios(.uploadGSdir) # list of scenarios missing <- queueUploadMissing(folder = ss_id, name = "longRuns", uploadFolder = .uploadGSdir) # queue rows only ## --- Per-field labels in the path ------------------------------------- # `withFieldLabel` accepts two forms. # 1) Unnamed character vector: prefix with the field name itself. as_path(scens[[1]], withFieldLabel = c(".rep", ".SSP")) #> "outputs/6.3.1/2071-2100/CNRM-ESM2-1/.SSP370/.rep5" # 2) Named character vector: prefix with the *mapped* label # (e.g., emit `.rep` as `rep`, `.SSP` as `_ssp`). as_path(scens[[1]], withFieldLabel = c(.rep = "rep", .SSP = "_ssp")) #> "outputs/6.3.1/2071-2100/CNRM-ESM2-1/_ssp370/rep5" # Set once for every subsequent as_path() / as_tarname(): register_scenario_format(withFieldLabel = c(.rep = "rep", .SSP = "_ssp")) as_path(scens[[1]]) #> "outputs/6.3.1/2071-2100/CNRM-ESM2-1/_ssp370/rep5" as_tarname(scens[[1]]) #> "6.3.1_2071-2100_CNRM-ESM2-1__ssp370_rep5.tar.gz" # Round-trip parses back to canonical fields: as_scenario("outputs/6.3.1/2071-2100/CNRM-ESM2-1/_ssp370/rep5") ## --- Project-specific format (FireSenseTesting layout) ---------------- # Layout: outputs/<.ELFind>/<range>/<GCM>_ssp<SSP>/rep<.rep> # E.g. outputs/6.3.1/2071-2100/CNRM-ESM2-1_ssp370/rep5 myBuild <- function(.ELFind, .samplingRange, .GCM, .SSP, .rep, pre = "outputs") { sr <- if (is.numeric(.samplingRange)) .samplingRange else eval(parse(text = .samplingRange)) file.path(pre, .ELFind, paste(range(sr), collapse = "-"), paste0(.GCM, ifelse(is.na(.SSP), "", paste0("_ssp", .SSP))), paste0("rep", .rep)) } myParse <- function(path, fields = scenarioFields(), pre = "outputs") { clean <- sub("\\.tar\\.gz$", "", path) clean <- sub(paste0("^", pre, "[/_]"), "", clean) parts <- if (grepl("/", clean)) strsplit(clean, "/")[[1L]] else strsplit(clean, "_")[[1L]] repIdx <- which(grepl("^rep[0-9]+$", parts)) rangeIdx <- which(grepl("^[0-9]+-[0-9]+$", parts)) gcmSsp <- paste(parts[(rangeIdx + 1L):(repIdx - 1L)], collapse = "_") gs <- if (grepl("_ssp", gcmSsp)) strsplit(gcmSsp, "_ssp")[[1L]] else c(gcmSsp, NA_character_) rng <- as.integer(strsplit(parts[rangeIdx], "-")[[1L]]) list(.ELFind = paste(parts[seq_len(rangeIdx - 1L)], collapse = "_"), .samplingRange = rng[1L]:rng[2L], .GCM = gs[1L], .SSP = gs[2L], .rep = as.integer(sub("^rep", "", parts[repIdx]))) } register_scenario_format(build = myBuild, parse = myParse) as_path(scens[[1]]) #> "outputs/6.3.1/2071-2100/CNRM-ESM2-1_ssp370/rep5" as_tarname(scens[[1]]) #> "6.3.1_2071-2100_CNRM-ESM2-1_ssp370_rep5.tar.gz" # Equivalent: define pathBuild / pathParse in your global environment # (e.g. in a project global.R) -- they will be auto-detected. pathBuild <- myBuild pathParse <- myParse ## End(Not run)## Not run: ## --- Default (generic) format ----------------------------------------- queue <- queueRead(folder = ss_id, name = "longRuns") # -> data.table with columns .ELFind, .samplingRange, .GCM, .SSP, .rep # plus meta columns (status, started_at, ...). Non-meta columns are # auto-cached as scenarioFields(). scens <- as_scenario(queue) # list of `scenario` objects as_path(scens[[1]]) #> "outputs/6.3.1/2071-2100/CNRM-ESM2-1/370/5" as_tarname(scens[[1]]) #> "6.3.1_2071-2100_CNRM-ESM2-1_370_5.tar.gz" # Round-trip s2 <- as_scenario("outputs/6.3.1/2071-2100/CNRM-ESM2-1/370/5") identical(unclass(scens[[1]]), unclass(s2)) # TRUE # Cross-reference queue against uploaded tarballs uploads <- outScenarios(.uploadGSdir) # list of scenarios missing <- queueUploadMissing(folder = ss_id, name = "longRuns", uploadFolder = .uploadGSdir) # queue rows only ## --- Per-field labels in the path ------------------------------------- # `withFieldLabel` accepts two forms. # 1) Unnamed character vector: prefix with the field name itself. as_path(scens[[1]], withFieldLabel = c(".rep", ".SSP")) #> "outputs/6.3.1/2071-2100/CNRM-ESM2-1/.SSP370/.rep5" # 2) Named character vector: prefix with the *mapped* label # (e.g., emit `.rep` as `rep`, `.SSP` as `_ssp`). as_path(scens[[1]], withFieldLabel = c(.rep = "rep", .SSP = "_ssp")) #> "outputs/6.3.1/2071-2100/CNRM-ESM2-1/_ssp370/rep5" # Set once for every subsequent as_path() / as_tarname(): register_scenario_format(withFieldLabel = c(.rep = "rep", .SSP = "_ssp")) as_path(scens[[1]]) #> "outputs/6.3.1/2071-2100/CNRM-ESM2-1/_ssp370/rep5" as_tarname(scens[[1]]) #> "6.3.1_2071-2100_CNRM-ESM2-1__ssp370_rep5.tar.gz" # Round-trip parses back to canonical fields: as_scenario("outputs/6.3.1/2071-2100/CNRM-ESM2-1/_ssp370/rep5") ## --- Project-specific format (FireSenseTesting layout) ---------------- # Layout: outputs/<.ELFind>/<range>/<GCM>_ssp<SSP>/rep<.rep> # E.g. outputs/6.3.1/2071-2100/CNRM-ESM2-1_ssp370/rep5 myBuild <- function(.ELFind, .samplingRange, .GCM, .SSP, .rep, pre = "outputs") { sr <- if (is.numeric(.samplingRange)) .samplingRange else eval(parse(text = .samplingRange)) file.path(pre, .ELFind, paste(range(sr), collapse = "-"), paste0(.GCM, ifelse(is.na(.SSP), "", paste0("_ssp", .SSP))), paste0("rep", .rep)) } myParse <- function(path, fields = scenarioFields(), pre = "outputs") { clean <- sub("\\.tar\\.gz$", "", path) clean <- sub(paste0("^", pre, "[/_]"), "", clean) parts <- if (grepl("/", clean)) strsplit(clean, "/")[[1L]] else strsplit(clean, "_")[[1L]] repIdx <- which(grepl("^rep[0-9]+$", parts)) rangeIdx <- which(grepl("^[0-9]+-[0-9]+$", parts)) gcmSsp <- paste(parts[(rangeIdx + 1L):(repIdx - 1L)], collapse = "_") gs <- if (grepl("_ssp", gcmSsp)) strsplit(gcmSsp, "_ssp")[[1L]] else c(gcmSsp, NA_character_) rng <- as.integer(strsplit(parts[rangeIdx], "-")[[1L]]) list(.ELFind = paste(parts[seq_len(rangeIdx - 1L)], collapse = "_"), .samplingRange = rng[1L]:rng[2L], .GCM = gs[1L], .SSP = gs[2L], .rep = as.integer(sub("^rep", "", parts[repIdx]))) } register_scenario_format(build = myBuild, parse = myParse) as_path(scens[[1]]) #> "outputs/6.3.1/2071-2100/CNRM-ESM2-1_ssp370/rep5" as_tarname(scens[[1]]) #> "6.3.1_2071-2100_CNRM-ESM2-1_ssp370_rep5.tar.gz" # Equivalent: define pathBuild / pathParse in your global environment # (e.g. in a project global.R) -- they will be auto-detected. pathBuild <- myBuild pathParse <- myParse ## End(Not run)
Returns the field labels currently used to (a) parse paths/tarnames
back into scenario records and (b) determine which queue columns
constitute the scenario (vs. queue meta-columns). Set automatically by
queueRead(); can be set manually with scenarioFieldsSet().
scenarioFields() scenarioFieldsSet(fields)scenarioFields() scenarioFieldsSet(fields)
fields |
Character vector of field labels. |
Character vector of field names, or NULL if not yet known.
This function will create a sub-folder of the lib.loc directory that
is based on the R version and the platform, as per the standard R package directory
naming convention
setProjPkgDir(lib.loc = "packages", verbose = getOption("Require.verbose", 1L))setProjPkgDir(lib.loc = "packages", verbose = getOption("Require.verbose", 1L))
lib.loc |
The folder for installing packages inside of |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
Convenience helper, intended primarily for interactive use, that parses
each file (local path or github.com URL with @branch notation) into a
named list.
setupFiles( files, paths, envir = parent.frame(), verbose = getOption("Require.verbose", 1L) )setupFiles( files, paths, envir = parent.frame(), verbose = getOption("Require.verbose", 1L) )
files |
A vector or list of files to parse. These can be remote github.com files. |
paths |
a list with named elements, specifically, |
envir |
The environment where |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
setupFiles is a convenience function intended for interactive use to verify the files being parsed.
This is similar to parse, but each element must be a named list or a named object, such as a function.
It uses the same specification for https://github.com
files as setupProject, i.e., using @ for branch.
setupFiles("PredictiveEcology/PredictiveEcology.org@main/tutos/castorExample/params.R")
setupFiles a named list with each element that was parsed.
setupProject() for the high-level wrapper, setup_family for an overview.
Source the functions supplied to setupProject() so they are available
to subsequent setup* steps and to the user's session.
setupFunctions( functions, name, sideEffects, paths, overwrite = FALSE, envir = parent.frame(), callingEnv = sys.frame(-2), verbose = getOption("Require.verbose", 1L), dots, defaultDots, ... )setupFunctions( functions, name, sideEffects, paths, overwrite = FALSE, envir = parent.frame(), callingEnv = sys.frame(-2), verbose = getOption("Require.verbose", 1L), dots, defaultDots, ... )
functions |
A set of function definitions to be used within |
name |
Optional. If supplied, the name of the project. If not supplied, an
attempt will be made to extract the name from the |
sideEffects |
Optional. This can be an expression or one or more file names or
a code chunk surrounded by |
paths |
a list with named elements, specifically, |
overwrite |
Logical vector or character vector, however, only |
envir |
The environment where |
callingEnv |
The environment from which the function was called. Defaults to |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
dots |
Any other named objects passed as a list a user might want for other elements. |
defaultDots |
A named list of any arbitrary R objects.
These can be supplied to give default values to objects that
are otherwise passed in with the |
... |
further named arguments that acts like |
setupFunctions will source the functions supplied, with a parent environment being
the internal temporary environment of the setupProject, i.e., they will have
access to all the objects in the call.
setupFunctions returns NULL. All functions will be placed in envir.
setupProject() for the high-level wrapper, setup_family for an overview.
## simplest case; just creates folders out <- setupProject( paths = list(projectPath = ".") # ) # specifying functions argument, with a local file and a definition here tf <- tempfile(fileext = ".R") fnDefs <- c("fn <- function(x) x\n", "fn2 <- function(x) x\n", "fn3 <- function(x) terra::rast(x)") cat(text = fnDefs, file = tf) funHere <- function(y) y out <- setupProject(functions = list(a = function(x) return(x), tf, funHere = funHere), # have to name it # now use the functions when creating objects drr = 1, b = a(drr), q = funHere(22), ddd = fn3(terra::ext(0,b,0,b)))## simplest case; just creates folders out <- setupProject( paths = list(projectPath = ".") # ) # specifying functions argument, with a local file and a definition here tf <- tempfile(fileext = ".R") fnDefs <- c("fn <- function(x) x\n", "fn2 <- function(x) x\n", "fn3 <- function(x) terra::rast(x)") cat(text = fnDefs, file = tf) funHere <- function(y) y out <- setupProject(functions = list(a = function(x) return(x), tf, funHere = funHere), # have to name it # now use the functions when creating objects drr = 1, b = a(drr), q = funHere(22), ddd = fn3(terra::ext(0,b,0,b)))
packagePath and/or modulePath to the project's .gitignore
Helper that keeps the .gitignore of a project under git control in sync
with the project's resolved paths.
setupGitIgnore( paths, gitignore = getOption("SpaDES.project.gitignore", TRUE), verbose )setupGitIgnore( paths, gitignore = getOption("SpaDES.project.gitignore", TRUE), verbose )
paths |
a list with named elements, specifically, |
gitignore |
Logical. Only has an effect if the |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
setupGitIgnore will add the relevant paths to .gitignore.
setupGitIgnore is run for its side effects, i.e., adding either paths$packagePath
and/or paths$modulePath to the
.gitignore file. It will check whether packagePath is located inside the
paths$projectPath and will add this folder to the .gitignore if TRUE.
If the project is a git repository with git submodules, then it will add nothing else.
If the project is a git repository without git submodules, then the paths$modulePath
will be added to the .gitignore file. It is assumed that these modules are
used in a read only manner.
setupProject() for the high-level wrapper, setup_family for an overview.
modulePath
Materialise the modules requested in setupProject() beneath
paths[["modulePath"]], optionally as git submodules.
setupModules( name, paths, modules, inProject, useGit = getOption("SpaDES.project.useGit", FALSE), overwrite = FALSE, envir = parent.frame(), callingEnv = sys.frame(-2), gitUserName, verbose = getOption("Require.verbose", 1L), dots, defaultDots, updateRprofile = getOption("SpaDES.project.updateRprofile", TRUE), ... )setupModules( name, paths, modules, inProject, useGit = getOption("SpaDES.project.useGit", FALSE), overwrite = FALSE, envir = parent.frame(), callingEnv = sys.frame(-2), gitUserName, verbose = getOption("Require.verbose", 1L), dots, defaultDots, updateRprofile = getOption("SpaDES.project.updateRprofile", TRUE), ... )
name |
Optional. If supplied, the name of the project. If not supplied, an
attempt will be made to extract the name from the |
paths |
a list with named elements, specifically, |
modules |
a character vector of modules to pass to
|
inProject |
A logical. If |
useGit |
(if not FALSE, then experimental still). There are two levels at which a project
can use GitHub, either the |
overwrite |
Logical vector or character vector, however, only |
envir |
The environment where |
callingEnv |
The environment from which the function was called. Defaults to |
gitUserName |
The GitHub account name. Used with git clone [email protected]:gitHuserName/name |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
dots |
Any other named objects passed as a list a user might want for other elements. |
defaultDots |
A named list of any arbitrary R objects.
These can be supplied to give default values to objects that
are otherwise passed in with the |
updateRprofile |
Logical. Should the |
... |
further named arguments that acts like |
setupModules will download all modules do not yet exist locally. The current
test for "exists locally" is simply whether the directory exists. If a user
wants to update the module, overwrite = TRUE must be set, or else the user can
remove the folder manually.
setupModules is run for its side effects, i.e., downloads modules and puts them
into the paths[["modulePath"]]. It will return a named list, where the names are the
full module names and the list elemen.ts are the R packages that the module
depends on (reqsPkgs)
setupProject() for the high-level wrapper, setup_family for an overview.
Set the options() supplied to setupProject() and record the prior values
so they can be restored.
setupOptions( name, options, paths, times, overwrite = FALSE, envir = parent.frame(), callingEnv = sys.frame(-2), verbose = getOption("Require.verbose", 1L), dots, defaultDots, useGit = getOption("SpaDES.project.useGit", FALSE), updateRprofile = getOption("SpaDES.project.updateRprofile", TRUE), ... )setupOptions( name, options, paths, times, overwrite = FALSE, envir = parent.frame(), callingEnv = sys.frame(-2), verbose = getOption("Require.verbose", 1L), dots, defaultDots, useGit = getOption("SpaDES.project.useGit", FALSE), updateRprofile = getOption("SpaDES.project.updateRprofile", TRUE), ... )
name |
Optional. If supplied, the name of the project. If not supplied, an
attempt will be made to extract the name from the |
options |
Optional. Either a named list to be passed to |
paths |
a list with named elements, specifically, |
times |
Optional. This will be returned if supplied; if supplied, the values
can be used in e.g., |
overwrite |
Logical vector or character vector, however, only |
envir |
The environment where |
callingEnv |
The environment from which the function was called. Defaults to |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
dots |
Any other named objects passed as a list a user might want for other elements. |
defaultDots |
A named list of any arbitrary R objects.
These can be supplied to give default values to objects that
are otherwise passed in with the |
useGit |
(if not FALSE, then experimental still). There are two levels at which a project
can use GitHub, either the |
updateRprofile |
Logical. Should the |
... |
further named arguments that acts like |
setupOptions can handle sequentially specified values, meaning a user can
first create a list of default options, then a list of user-desired options that
may or may not replace individual values. Thus final values will be based on the
order that they are provided.
setupOptions is run for its side effects, namely, changes to the options(). The
list of modified options will be added as an attribute (attr(out, "projectOptions")),
e.g., so they can be "unset" by user later.
setupProject() for the high-level wrapper, setup_family for an overview.
Combine the modules' reqdPkgs with the user-supplied packages and
install all of them into paths[["packagePath"]] via Require::Install().
setupPackages( packages, modulePackages = list(), require = list(), paths, libPaths, setLinuxBinaryRepo = TRUE, standAlone, envir = parent.frame(), callingEnv = sys.frame(-2), verbose = getOption("Require.verbose"), dots, defaultDots, ... )setupPackages( packages, modulePackages = list(), require = list(), paths, libPaths, setLinuxBinaryRepo = TRUE, standAlone, envir = parent.frame(), callingEnv = sys.frame(-2), verbose = getOption("Require.verbose"), dots, defaultDots, ... )
packages |
Optional. A vector of packages that must exist in the |
modulePackages |
A named list, where names are the module names, and the elements
of the list are packages in a form that |
require |
Optional. A character vector of packages to install and attach
(with |
paths |
a list with named elements, specifically, |
libPaths |
Deprecated. Use |
setLinuxBinaryRepo |
Logical. Should the binary RStudio Package Manager be used on Linux (ignored if Windows) |
standAlone |
A logical. Passed to |
envir |
The environment where |
callingEnv |
The environment from which the function was called. Defaults to |
verbose |
Numeric or logical indicating how verbose the function should be.
At |
dots |
Any other named objects passed as a list a user might want for other elements. |
defaultDots |
A named list of any arbitrary R objects.
These can be supplied to give default values to objects that
are otherwise passed in with the |
... |
further named arguments that acts like |
setupPackages will read the modules' metadata reqdPkgs element. It will combine
these with any packages passed manually by the user to packages, and pass all
these packages to Require::Install(...).
setupPackages is run for its side effects, i.e., installing packages to
paths[["packagePath"]].
setupProject() for the high-level wrapper, setup_family for an overview.
simInit()
Build the nested params list that SpaDES.core::simInit() consumes from
the user-supplied params argument to setupProject().
setupParams( name, params, paths, modules, times, options, overwrite = FALSE, envir = parent.frame(), callingEnv = sys.frame(-2), verbose = getOption("Require.verbose", 1L), dots, defaultDots, ... )setupParams( name, params, paths, modules, times, options, overwrite = FALSE, envir = parent.frame(), callingEnv = sys.frame(-2), verbose = getOption("Require.verbose", 1L), dots, defaultDots, ... )
name |
Optional. If supplied, the name of the project. If not supplied, an
attempt will be made to extract the name from the |
params |
Optional. Similar to |
paths |
a list with named elements, specifically, |
modules |
a character vector of modules to pass to
|
times |
Optional. This will be returned if supplied; if supplied, the values
can be used in e.g., |
options |
Optional. Either a named list to be passed to |
overwrite |
Logical vector or character vector, however, only |
envir |
The environment where |
callingEnv |
The environment from which the function was called. Defaults to |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
dots |
Any other named objects passed as a list a user might want for other elements. |
defaultDots |
A named list of any arbitrary R objects.
These can be supplied to give default values to objects that
are otherwise passed in with the |
... |
further named arguments that acts like |
setupParams prepares a named list of named lists, suitable to be passed to
the params argument of simInit.
setupProject() for the high-level wrapper, setup_family for an overview.
Resolve, default-fill, and apply the path list used by setupProject().
setupPaths( name, paths, inProject, standAlone = TRUE, libPaths = NULL, updateRprofile = getOption("SpaDES.project.updateRprofile", TRUE), Restart = getOption("SpaDES.project.Restart", FALSE), overwrite = FALSE, envir = parent.frame(), callingEnv = sys.frame(-2), useGit = getOption("SpaDES.project.useGit", FALSE), verbose = getOption("Require.verbose", 1L), dots, defaultDots, ... )setupPaths( name, paths, inProject, standAlone = TRUE, libPaths = NULL, updateRprofile = getOption("SpaDES.project.updateRprofile", TRUE), Restart = getOption("SpaDES.project.Restart", FALSE), overwrite = FALSE, envir = parent.frame(), callingEnv = sys.frame(-2), useGit = getOption("SpaDES.project.useGit", FALSE), verbose = getOption("Require.verbose", 1L), dots, defaultDots, ... )
name |
Optional. If supplied, the name of the project. If not supplied, an
attempt will be made to extract the name from the |
paths |
a list with named elements, specifically, |
inProject |
A logical. If |
standAlone |
A logical. Passed to |
libPaths |
Deprecated. Use |
updateRprofile |
Logical. Should the |
Restart |
Logical or character. If either |
overwrite |
Logical vector or character vector, however, only |
envir |
An environment within which to look for objects. If called alone,
the function should use its own internal environment. If called from another
function, e.g., |
callingEnv |
The environment from which the function was called. Defaults to |
useGit |
(if not FALSE, then experimental still). There are two levels at which a project
can use GitHub, either the |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
dots |
Any other named objects passed as a list a user might want for other elements. |
defaultDots |
A named list of any arbitrary R objects.
These can be supplied to give default values to objects that
are otherwise passed in with the |
... |
further named arguments that acts like |
setupPaths will fill in any paths that are not explicitly supplied by the
user as a named list. These paths that can be set are:
projectPath, packagePath, cachePath, inputPath,
modulePath, outputPath, rasterPath, scratchPath, terraPath.
These are grouped thematically into three groups of paths:
projectPath and packagePath affect the project, regardless
of whether a user uses SpaDES modules. cachePath, inputPath, outputPath and
modulePath are all used by SpaDES within module contexts. scratchPath,
rasterPath and terraPath are all "temporary" or "scratch" directories.
setupPaths returns a list of paths that are created. projectPath will be
assumed to be the base of other non-temporary and non-R-library paths. This means
that all paths that are directly used by simInit are assumed to be relative
to the projectPath. If a user chooses to specify absolute paths, then they will
be returned as is. It is also called for its
side effect which is to call setPaths, with each of these paths as an argument.
See table for details. If a user supplies extra paths not useable by SpaDES.core::simInit,
these will added as an attribute ("extraPaths") to the paths element
in the returned object. These will still exist directly in the returned list
if a user uses setupPaths directly, but these will not be returned with
setupProject because setupProject is intended to be used with SpaDES.core::simInit.
In addition, three paths will be added to this same attribute automatically:
projectPath, packagePath, and .prevLibPaths which is the previous value for
.libPaths() before changing to packagePath.
| Path | Default if not supplied by user | Effects |
| Project Level Paths | ||
projectPath
|
if getwd() is name, then just getwd; if not
file.path(getwd(), name) |
If current project is not this project
and using Rstudio, then the current
project will close and a new project will
open in the same Rstudio session, unless
Restart = FALSE
|
packagePath
|
file.path(tools::R_user_dir("data"), name, "packages", version$platform, substr(getRversion(), 1, 3))
|
appends this path to .libPaths(packagePath),
unless standAlone = TRUE, in which case,
it will set .libPaths(packagePath, include.site = FALSE) to this path |
| ------ | ----------- | ----- |
| Module Level Paths | ||
cachePath |
file.path(projectPath, "cache") |
options(reproducible.cachePath = cachePath)
|
inputPath |
file.path(projectPath, "inputs") |
options(spades.inputPath = inputPath)
|
modulePath |
file.path(projectPath, "modules") |
options(spades.inputPath = outputPath) |
outputPath |
file.path(projectPath, "outputs") |
options(spades.inputPath = modulePath) |
| ------ | ----------- | ----- |
| Temporary Paths | ||
scratchPath
|
file.path(tempdir(), name) |
|
rasterPath |
file.path(scratchPath, "raster") |
sets (rasterOptions(tmpdir = rasterPath)) |
terraPath |
file.path(scratchPath, "terra") |
sets (terraOptions(tempdir = terraPath)) |
| ------ | ----------- | ----- |
| Other Paths | ||
logPath |
file.path(outputPath(sim), "log") |
sets options("spades.logPath") accessible by logPath(sim)
|
tilePath |
Not implemented yet | Not implemented yet |
setupProject() for the high-level wrapper, setup_family for an overview.
setupProject calls a sequence of functions in this order:
setupOptions (first time), setupPaths, setupRestart,
setupFunctions, setupModules, setupPackages, setupSideEffects,
setupOptions (second time), setupParams, and setupGitIgnore.
This sequence will create folder structures, install missing packages from those
listed in either the packages, require arguments or in the modules reqdPkgs fields,
load packages (only those in the require argument), set options, download or
confirm the existence of modules. It will also return elements that can be passed
directly to simInit or simInitAndSpades, specifically, modules, params,
paths, times, and any named elements passed to .... This function will also
, if desired, change the .Rprofile file for this project so that every time
the project is opened, it has a specific .libPaths().
There are a number of convenience elements described in the section below. See Details.
Because of this sequence, users can take advantage of settings (i.e., objects)
that happen (are created) before others. For example, users can set paths
then use the paths list to set options that will can update/change paths,
or set times and use the times list for certain entries in params.
setupProject( name, paths, modules, packages, times, options, params, sideEffects, functions, config, require = NULL, studyArea = NULL, Restart = getOption("SpaDES.project.Restart"), useGit = getOption("SpaDES.project.useGit"), setLinuxBinaryRepo = getOption("SpaDES.project.setLinuxBinaryRepo"), standAlone = getOption("SpaDES.project.standAlone"), libPaths = NULL, updateRprofile = getOption("SpaDES.project.updateRprofile"), overwrite = getOption("SpaDES.project.overwrite"), verbose = getOption("Require.verbose", 1L), defaultDots, envir = parent.frame(), dots, ... )setupProject( name, paths, modules, packages, times, options, params, sideEffects, functions, config, require = NULL, studyArea = NULL, Restart = getOption("SpaDES.project.Restart"), useGit = getOption("SpaDES.project.useGit"), setLinuxBinaryRepo = getOption("SpaDES.project.setLinuxBinaryRepo"), standAlone = getOption("SpaDES.project.standAlone"), libPaths = NULL, updateRprofile = getOption("SpaDES.project.updateRprofile"), overwrite = getOption("SpaDES.project.overwrite"), verbose = getOption("Require.verbose", 1L), defaultDots, envir = parent.frame(), dots, ... )
name |
Optional. If supplied, the name of the project. If not supplied, an
attempt will be made to extract the name from the |
paths |
a list with named elements, specifically, |
modules |
a character vector of modules to pass to
|
packages |
Optional. A vector of packages that must exist in the |
times |
Optional. This will be returned if supplied; if supplied, the values
can be used in e.g., |
options |
Optional. Either a named list to be passed to |
params |
Optional. Similar to |
sideEffects |
Optional. This can be an expression or one or more file names or
a code chunk surrounded by |
functions |
A set of function definitions to be used within |
config |
Reserved for future use. Currently unimplemented; supplying a value triggers an error. |
require |
Optional. A character vector of packages to install and attach
(with |
studyArea |
Optional. If a list, it will be passed to
|
Restart |
Logical or character. If either |
useGit |
(if not FALSE, then experimental still). There are two levels at which a project
can use GitHub, either the |
setLinuxBinaryRepo |
Logical. Should the binary RStudio Package Manager be used on Linux (ignored if Windows) |
standAlone |
A logical. Passed to |
libPaths |
Deprecated. Use |
updateRprofile |
Logical. Should the |
overwrite |
Logical vector or character vector, however, only |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
defaultDots |
A named list of any arbitrary R objects.
These can be supplied to give default values to objects that
are otherwise passed in with the |
envir |
The environment where |
dots |
Any other named objects passed as a list a user might want for other elements. |
... |
further named arguments that acts like |
setupProject will return a named list with elements modules, paths, params, and times.
The goal of this list is to contain list elements that can be passed directly
to simInit.
It will also append all elements passed by the user in the ....
This list can be passed directly to SpaDES.core::simInit() or
SpaDES.core::simInitAndSpades() using a do.call(). See example.
NOTE: both projectPath and packagePath will be omitted in the paths list
as they are used to set current directory (found with getwd()) and .libPaths()[1],
but are not accepted by simInit. setupPaths will still return these two paths as its
outputs are not expected to be passed directly to simInit (unlike setupProject outputs).
There are a number of checks that occur during setupProject. These take time, particularly
after an R restart (there is some caching in RAM that occurs, but this will only speed
things up if there is no restart of R). To get the "fastest", these options or settings
will speed things up, at the expense of not being completely re-runnable.
You can add one or more of these to the arguments. These will only be useful after a project
is set up, i.e., setupProject and SpaDES.core::simInit has/have been run at least once
to completion (so packages are installed).
options = c(
reproducible.useMemoise = TRUE, # For caching, use memory objects
Require.cloneFrom = Sys.getenv("R_LIBS_USER"),# Use personal library as possible source of packages
spades.useRequire = FALSE, # Won't install packages/update versions
spades.moduleCodeChecks = FALSE, # moduleCodeChecks checks for metadata mismatches
reproducible.inputPaths = "~/allData"), # For sharing data files across projects
packages = NULL, # Prevents any packages installs with setupProject
useGit = FALSE # Prevents checks using git
These will be set early in setupProject, so will affect the running of setupProject.
If the user manually sets one of these in addition to setting these, the user options will
override these.
The remining causes of setupProject being "slow" will be loading the required packages.
These options/arguments can now be set all at once
(with caution as these changes will affect how your
script will be run) with options(SpaDES.project.fast = TRUE) or in the options argument.
The overarching objectives for these functions are:
To prepare what is needed for simInit.
To help a user eliminate virtually all assignments to the .GlobalEnv,
as these create and encourage spaghetti code that becomes unreproducible
as the project increases in complexity.
Be very simple for beginners, but powerful enough to expand to almost any needs of arbitrarily complex projects, using the same structure
Deal with the complexities of R package installation and loading when working with modules that may have been created by many users
Create a common SpaDES project structure, allowing easy transition from one project to another, regardless of complexity.
Throughout these functions, efforts have been made to implement sequential evaluation,
within files and within lists. This means that a user can use the values from an
upstream element in the list. For example, the following where projectPath is
part of the list that will be assigned to the paths argument and it is then
used in the subsequent list element is valid:
setupPaths(paths = list(projectPath = "here",
modulePath = file.path(paths[["projectPath"]], "modules")))
Because of such sequential evaluation, paths, options, and params files
can be sequential lists that have impose a hierarchy specified
by the order. For example, a user can first create a list of default options,
then several lists of user-desired options behind an if (user("emcintir"))
block that add new or override existing elements, followed by machine specific
values, such as paths.
setupOptions(
maxMemory <- 5e+9 # if (grepl("LandWeb", runName)) 5e+12 else 5e+9
# Example -- Use any arbitrary object that can be passed in the `...` of `setupOptions`
# or `setupProject`
if (.mode == "development") {
list(test = 2)
}
if (machine("A127")) {
list(test = 3)
}
)
Arguments that are not the named arguments (i.e., the ones passed in ...)
are evaluated in the order they are written. Subsequent arguments can use the
previous arguments. If "dot" arguments are declared before the first
standard arguments (the "formals") of the function, then they will be evaluated
prior to the formals. If they are after a single standard argument (i.e., not
necessarily after all the named arguments), then they will be evaluated after
all standard arguments. The exception to this is params, which will be evaluated
like the ... arguments, i.e., in order.
The arguments, paths, options, and params, can all
understand lists of named values, character vectors, or a mixture by using a list where
named elements are values and unnamed elements are character strings/vectors. Any unnamed
character string/vector will be treated as a file path. If that file path has an @ symbol,
it will be assumed to be a file that exists on a GitHub repository in https://github.com.
So a user can pass values, or pointers to remote and/or local paths that themselves have values.
The following will set an option as declared, plus read the local file (with relative path), plus download and read the cloud-hosted file.
setupProject(
options = list(reproducible.useTerra = TRUE,
"inst/options.R",
"PredictiveEcology/SpaDES.project@development/inst/options.R")
)
)
This approach allows for an organic growth of complexity, e.g., a user begins with only named lists of values, but then as the number of values increases, it may be helpful to put some in an external file.
NOTE: if the GitHub repository is private the user must configure their GitHub
token by setting the GITHUB_PAT environment variable – unfortunately, the usethis
approach to setting the token will not work at this moment.
paths, options, params
If paths, options, and/or params are a character string
or character vector (or part of an unnamed list element) the string(s)
will be interpreted as files to parse. These files should contain R code that
specifies named lists, where the names are one or more paths, options,
or are module names, each with a named list of parameters for that named module.
This last named list for params follows the convention used for the params argument in
simInit(..., params = ).
These files can use paths, times, plus any previous list in the sequence of
params or options specified. Any functions that are used must be available,
e.g., prefixed Require::normPath if the package has not been loaded (as recommended).
If passing a file to options, it should not set options() explicitly;
only create named lists. This enables options checking/validating
to occur within setupOptions and setupParams. A simplest case would be a file with this:
opts <- list(reproducible.destinationPath = "~/destPath").
All named lists will be parsed into their own environment, and then will be
sequentially evaluated (i.e., subsequent lists will have access to previous lists),
with each named elements setting or replacing the previously named element of the same name,
creating a single list. This final list will be assigned to, e.g., options() inside setupOptions.
Because each list is parsed separately, they to not need to be assigned objects;
if they are, the object name can be any name, even if similar to another object's name
used to built the same argument's (i.e. paths, params, options) final list.
Hence, in an file to passed to options, instead of incrementing the list as:
a <- list(optA = 1) b <- append(a, list(optB = 2)) c <- append(b, list(optC = 2.5)) d <- append(c, list(optD = 3))
one can do:
a <- list(optA = 1) a <- list(optB = 2) c <- list(optC = 2.5) list(optD = 3)
NOTE: only atomics (i.e., character, numeric, etc.), named lists, or either of these that are protected by 1 level of "if" are parsed. This will not work, therefore, for other side-effect elements, like authenticating with a cloud service.
Several helper functions exist within SpaDES.project that may be useful, such
as user(...), machine(...)
To allow for batch submission, a user can specify code argument = value even if value
is missing. This type of specification will not work in normal parsing of arguments,
but it is designed to work here. In the next example, .mode = .mode can be specified,
but if R cannot find .mode for the right hand side, it will just skip with no error.
Thus a user can source a script with the following line from batch script where .mode
is specified. When running this line without that batch script specification, then this
will assign no value to .mode. We include .nodes which shows an example of
passing a value that does exist. The non-existent .mode will be returned in the out,
but as an unevaluated, captured list element.
.nodes <- 2
out <- setupProject(.mode = .mode,
.nodes = .nodes,
options = "inst/options.R"
)
verbose is passed through to the inner setup* helpers. Notably, verbose >= 2
prints the modules' reqdPkgs grouped by module, and verbose >= 3 additionally
prints the dput() of the exact package vector passed to Require::Require (see
setupPackages()).
Inner setup* helpers (each has its own help page; see setup_family
for a one-page overview):
setupPaths(), setupFunctions(), setupSideEffects(),
setupOptions(), setupModules(), setupPackages(),
setupParams(), setupGitIgnore(), setupStudyArea(), setupFiles().
teardownProject() reverses setupProject() and restores the prior
.libPaths() (kept on the output as out$paths$.previousLibPaths).
Also, helpful functions such as user(), machine(), node().
vignette("i-getting-started", package = "SpaDES.project")
## For more examples: vignette("i-getting-started", package = "SpaDES.project") library(SpaDES.project) ## simplest case; just creates folders out <- setupProject( paths = list(projectPath = ".") # )## For more examples: vignette("i-getting-started", package = "SpaDES.project") library(SpaDES.project) ## simplest case; just creates folders out <- setupProject( paths = list(projectPath = ".") # )
Source the side-effect scripts or expressions supplied to setupProject();
nothing is returned to the user.
setupSideEffects( name, sideEffects, paths, times, overwrite = FALSE, envir = parent.frame(), callingEnv = sys.frame(-2), verbose = getOption("Require.verbose", 1L), dots, defaultDots, ... )setupSideEffects( name, sideEffects, paths, times, overwrite = FALSE, envir = parent.frame(), callingEnv = sys.frame(-2), verbose = getOption("Require.verbose", 1L), dots, defaultDots, ... )
name |
Optional. If supplied, the name of the project. If not supplied, an
attempt will be made to extract the name from the |
sideEffects |
Optional. This can be an expression or one or more file names or
a code chunk surrounded by |
paths |
a list with named elements, specifically, |
times |
Optional. This will be returned if supplied; if supplied, the values
can be used in e.g., |
overwrite |
Logical vector or character vector, however, only |
envir |
The environment where |
callingEnv |
The environment from which the function was called. Defaults to |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
dots |
Any other named objects passed as a list a user might want for other elements. |
defaultDots |
A named list of any arbitrary R objects.
These can be supplied to give default values to objects that
are otherwise passed in with the |
... |
further named arguments that acts like |
Most arguments in the family of setup* functions are run sequentially, even within
the argument. Since most arguments take lists, the user can set values at a first
value of a list, then use it in calculation of the 2nd value and so on. See
examples. This "sequential" evaluation occurs in the ..., setupSideEffects, setupOptions,
setupParams (this does not work for setupPaths) can handle sequentially
specified values, meaning a user can
first create a list of default options, then a list of user-desired options that
may or may not replace individual values. This can create hierarchies, based on
order.
setupSideEffects is run for its side effects (e.g., web authentication, custom package
options that cannot use base::options), with deliberately nothing returned to user.
This, like other parts of this function, attempts to prevent unwanted outcomes
that occur when a user uses e.g., source without being very careful about
what and where the objects are sourced to.
setupProject() for the high-level wrapper, setup_family for an overview.
studyArea spec via geodata::gadm()
Convenience wrapper that returns an sf polygon for the requested
country / subregion using geodata::gadm().
setupStudyArea( studyArea, paths, envir = parent.frame(), callingEnv = sys.frame(-2), verbose = getOption("Require.verbose", 1L) )setupStudyArea( studyArea, paths, envir = parent.frame(), callingEnv = sys.frame(-2), verbose = getOption("Require.verbose", 1L) )
studyArea |
Optional. If a list, it will be passed to
|
paths |
a list with named elements, specifically, |
envir |
The environment where |
callingEnv |
The environment from which the function was called. Defaults to |
verbose |
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if |
setupStudyArea calls [geodata::gadm()] to get an sf polygon or set of polygons
of a country or a subdivision of a country. The user can pass a named list of character elements
that match entries in the columns "NAME_1" and "NAME_2" of the sf object. If passing
NAME_2, the user must pass level = 2. If passing NAME_1 and level = 2 all subdivision polygons under
NAME_1 will be returned, which can be useful to explore subdivision names.
setupStudyArea only uses inputPath within its paths argument, which will
be passed to path argument of gadm.
setupStudyArea(list(NAME_1 = "Alberta", "NAME_2" = "Division No. 17", level = 2))
setupStudyArea will return an sf class object coming from geodata::gadm,
with subregion specification as described in the studyArea argument.
setupProject() for the high-level wrapper, setup_family for an overview.
simLists
Show method for simLists
## S4 method for signature 'simLists' show(object)## S4 method for signature 'simLists' show(object)
object |
|
Eliot McIntire
simInit and experiment in one stepRun simInit and experiment in one step
simInitAndExperiment( times, params, modules, objects, paths, inputs, outputs, loadOrder, notOlderThan, replicates, dirPrefix, substrLength, saveExperiment, experimentFile, clearSimEnv, cl, ... )simInitAndExperiment( times, params, modules, objects, paths, inputs, outputs, loadOrder, notOlderThan, replicates, dirPrefix, substrLength, saveExperiment, experimentFile, clearSimEnv, cl, ... )
times, paths, outputs, loadOrder
|
Passed to |
params |
Like for |
modules |
Like for |
objects |
Like for |
inputs |
Like for |
notOlderThan |
Currently unused (kept for back-compatibility). |
replicates |
The number of replicates to run of the same |
dirPrefix |
String vector. This will be concatenated as a prefix on the directory names. |
substrLength |
Numeric. While making |
saveExperiment |
Logical. Should the resulting experimental design be saved to a file. Default TRUE. |
experimentFile |
String. Filename if |
clearSimEnv |
Logical. If TRUE, then the |
cl |
Deprecated and ignored; control parallelism with |
... |
Passed to |
simInitAndExperiment cannot pass modules or params to experiment because
these are also in simInit. If the experiment is being used
to vary these arguments, it must be done separately (i.e., simInit then
experiment).
Moved here from the now-unmaintained SpaDES.experiment package.
simLists classThis is a grouping of simList objects. Normally this class will be
made using experiment2(), but can be made manually if there are
existing simList objects.
This class (and the experiment() / experiment2() functions that
produce it) was moved here from the now-unmaintained SpaDES.experiment
package.
pathsNamed list of modulePath, inputPath,
and outputPath paths. Partial matching is performed. These
will be prepended to the relative paths of each simList
.xDataEnvironment holding the simLists.
Eliot McIntire
SpaDES.project optionsThese demonstrate default values for some options that can be set in
SpaDES.project.
To see defaults, run spadesProjectOptions().
See Details below.
spadesProjectOptions()spadesProjectOptions()
Below are options that can be set with options("spades.xxx" = newValue),
where xxx is one of the values below, and newValue is a new value to
give the option. Sometimes these options can be placed in the user's .Rprofile
file so they persist between sessions.
The following options are used, and can mostly be specified in the various setup*
functions also.
| OPTION | DEFAULT VALUE | DESCRIPTION |
reproducible.cachePath
|
NOTE: uses reproducible. Defaults is within projectPath, with subfolder "cache" |
|
spades.inputPath
|
Default is within projectPath, with subfolder "inputs" | |
spades.modulePath
|
Default is within projectPath, with subfolder "modules" | |
spades.outputPath
|
Default is within projectPath, with subfolder "outputs" | |
spades.packagePath
|
Default to .libPathDefault(<projectPath>) |
|
spades.projectPath
|
Default "." | |
spades.scratchPath
|
Default is within tempdir(), with subfolder |
|
SpaDES.project.Restart
|
Default is FALSE. Passed to Restart argument in setupProject |
|
SpaDES.project.useGit
|
Default is FALSE. Passed to useGit argument in setupProject
|
SpaDES.project.ask is currently only used when offering to clone a remote
github repository. Setting this to FALSE will prevent asking and just "do it".
named list of the default options currently available.
A family of ready-made quoted expressions for the statusCalculate
argument of experimentTmux() and experimentFuture(). Pass one of
these objects directly instead of writing a custom quote({...}) block:
experimentTmux(..., statusCalculate = statusCalculate_LandR)
Each expression is evaluated once per queue row inside
tmuxRefreshQueueStatus(). Before evaluation the row's non-meta
columns are unpacked into the local environment by name, as are any
objects forwarded through .... The expression may assign to any subset
of the recognised meta-column names (started_at, finished_at,
heartbeat_at, heartbeat_iter, iterationsTotal, …) and should set
done <- TRUE to signal that the job has completed.
statusCalculate_FireSenseFit statusCalculate_LandRstatusCalculate_FireSenseFit statusCalculate_LandR
A base::quote()d block expression (is.call(statusCalculate_FireSenseFit) is TRUE).
A base::quote()d block expression (is.call(statusCalculate_LandR) is TRUE).
statusCalculate_FireSenseFit: Heartbeat calculator for fireSense fire-spread simulations.
Scans the job's output directory for burnMap_year<XXXX>.tif files and
"Annual Fire Maps" output files to populate the queue meta-columns:
heartbeat_iterMost recent fire-map year found after the worker
claimed the job, or times$start if none yet.
heartbeat_atModification timestamp of that file
(NA until the first fire-map appears).
started_atModification timestamp of the running-flag file (i.e. when the worker claimed the job).
doneTRUE when a burnMap file containing year<times$end>
is found.
finished_atTimestamp of the final-year burnMap (set only when
done).
iterationsTotalThe end year extracted from the burnMap filename
(set only when done).
statusCalculate_LandR: Heartbeat calculator for LandR vegetation simulations.
Scans the job's output directory for cohortData_year<XXXX>.rds
checkpoint files (written at each SpaDES save event) and maps them to the
standard queue meta-columns:
heartbeat_iterCurrent simulation year reached (character).
heartbeat_atTimestamp of the latest checkpoint file.
started_atTimestamp of the earliest checkpoint file (may be
refined later by the running-flag-file logic in
tmuxRefreshQueueStatus()).
doneSet to TRUE when heartbeat_iter >= outs$times$end,
triggering a status transition to DONE.
finished_atTimestamp of the final checkpoint (set only when
done).
iterationsTotalThe end year as a character string (set only
when done).
The expressions below expect the following to be available, either as
queue-data-frame columns or as named objects in the ... passed to
tmuxRefreshQueueStatus():
pathBuildA function whose arguments match the queue columns
used to construct the output-directory path. For
statusCalculate_LandR the call is
pathBuild(.ELFind, .samplingRange, .GCM, .SSP, .rep).
outsA list (typically stored in dots_path and loaded into
the worker's environment before global.R is sourced) whose element
outs$times$end gives the simulation end year.
statusCalculate_FireSenseFit only)timesA list with elements $start and $end giving the
simulation start and end years (integers). Typically a queue column.
experimentTmux(), experimentFuture(),
tmuxRefreshQueueStatus(), get_sim_year_heartbeat()
setupProject()
Reverse the side-effects of setupProject():
teardownProject(x, origLibPaths)teardownProject(x, origLibPaths)
x |
Either the list returned by |
origLibPaths |
Optional. The |
remove the project library directory created by setupProject(),
unlink the project paths returned by setupProject(),
restore the .libPaths() value that was in effect before
setupProject() was called.
The previous .libPaths() is stored on the setupProject() output as
out$paths$.previousLibPaths (and on attr(out$paths, "extraPaths")),
so teardownProject(out) is enough – no need to remember
origLibPaths separately.
NULL, invisibly. Called for its side effects.
setupProject() for what is being torn down.
Just a default path.
tmuxActiveRunningPath( activeRunningPath = NULL, queue_path, prefix = "logs", suffix = queue_path )tmuxActiveRunningPath( activeRunningPath = NULL, queue_path, prefix = "logs", suffix = queue_path )
activeRunningPath |
Optional character path. If |
queue_path |
Character. Path to the queue |
prefix |
Character. Directory prefix for the path. Default |
suffix |
Character. Suffix used in the path. Defaults to |
The default path.
Strips the leading "<host?>-<node>-<pid>-" prefix from each pane title
and groups panes whose remainders are identical. Intended to surface
cases where the same queue row has been claimed by two workers (e.g. a
stale RUNNING reclaim that was actually live).
tmuxFindDuplicates(panes = NULL, runPattern = "outputs-")tmuxFindDuplicates(panes = NULL, runPattern = "outputs-")
panes |
Optional data.frame as returned by |
runPattern |
Optional regex; only panes whose stripped title matches
it are considered. Default |
The prefix strip matches 1 or 2 non-dash chunks followed by a 6+-digit
PID followed by a dash – covering both <host>-<node>-<pid>-<runName>
and <node>-<pid>-<runName> title formats. Old-style titles lacking
this prefix are kept verbatim; a title is considered a duplicate only if
its stripped form appears on 2+ panes, so two differently-formatted titles
with the same tail still collapse correctly.
data.frame with the same columns as tmuxListPanes() plus
run_id (the stripped runName used for grouping) and group (integer
identifying each duplicate set). Rows are ordered by group then
pane_ref. Empty data.frame (with these columns) when no duplicates.
Development utility: kills all panes identified by their tmux pane IDs.
Uses kill-pane -t <pane-id>; panes already gone are ignored. See tmux manual. 1
tmuxKillPanes(panes)tmuxKillPanes(panes)
panes |
Character vector of tmux pane IDs (e.g., |
Invisibly returns the subset of panes successfully targeted.
Thin alias for experimentMonitor() in tmux-scan mode (no ef /
queue_paths). Preserved for backwards compatibility; new code
should call experimentMonitor() directly so the same call works for
experimentFuture() / experimentSBATCH() runs by passing ef.
tmuxListPanes(stats = FALSE)tmuxListPanes(stats = FALSE)
stats |
Logical. When |
Same as experimentMonitor(stats = stats) in tmux mode – see
that function's docs.
Mirror local queue to Google Sheets
tmuxMirrorQueueToSheets(queue_path, ss_id, sheet_name = "Status")tmuxMirrorQueueToSheets(queue_path, ss_id, sheet_name = "Status")
queue_path |
Path to the local tmux_queue.rds |
ss_id |
The Google Sheet ID (from the URL) |
sheet_name |
The name of the tab to write to |
Mirrors df into a queue RDS and adds status columns:
status, claimed_by, started_at, finished_at.
Adds metadata columns used by workers:
status: PENDING | RUNNING | DONE | FAILED
claimed_by: tmux pane id that claimed the row
started_at: "YYYY-MM-DD HH:MM:SS"
finished_at: "YYYY-MM-DD HH:MM:SS"
DEoptimElapsedTime: numeric seconds (sum(diff(allIterations[allIterations < 20 minutes])))
machine_name: Sys.info()[["nodename"]]
process_id: Sys.getpid()
heartbeat_at: latest timestamp (as character) detected by heartbeat
heartbeat_iter: latest iteration number (integer) detected by heartbeat
tmuxPrepareQueueFromDF(df, queue_path) tmuxPrepareQueueFromDF(df, queue_path)tmuxPrepareQueueFromDF(df, queue_path) tmuxPrepareQueueFromDF(df, queue_path)
df |
data.frame; experiment rows (columns become objects in workers) |
queue_path |
character; path to the queue |
Invisibly returns queue_path.
Invisibly returns queue_path.
Scans the simulation output directories (defined by runNameLabel) to assess
current status based on file timestamps and visual content of PNGs.
If a PNG has not been updated for a specified timeout, the task is marked
as "FINISHED" (if red pixels are detected) or "INTERRUPTED" (if no red
is detected).
tmuxRefreshQueueStatus( queue_path, timeout_min = 20, runNameLabel = quote(colnames(q)[1:2]), statusCalculate = getOption("spades.statusCalculate"), folderWithIterInFilename = getOption("spades.folderWithIterInFilename"), recheckDone = FALSE, activeRunningPath = getOption("spades.activeRunningPath"), ... )tmuxRefreshQueueStatus( queue_path, timeout_min = 20, runNameLabel = quote(colnames(q)[1:2]), statusCalculate = getOption("spades.statusCalculate"), folderWithIterInFilename = getOption("spades.folderWithIterInFilename"), recheckDone = FALSE, activeRunningPath = getOption("spades.activeRunningPath"), ... )
queue_path |
Character. Absolute path to the |
timeout_min |
Numeric. Minutes of inactivity before a task is considered stale. Defaults to 20. |
runNameLabel |
A quoted expression to derive a run label from the queue. Default uses first two columns. |
statusCalculate |
A quoted expression to compute job status from output files.
Defaults to |
folderWithIterInFilename |
A quoted expression for a folder with iteration info in filenames.
Defaults to |
recheckDone |
Logical. If |
activeRunningPath |
Directory for "running" flag files. See |
... |
Additional arguments (currently unused). |
A data.frame (the updated queue), invisibly. As a side effect, updates the RDS file on disk.
## Not run: # Assessment of all simulations in the current project tmuxRefreshQueueStatus("experiment_queue.rds", timeout_min = 30) ## End(Not run)## Not run: # Assessment of all simulations in the current project tmuxRefreshQueueStatus("experiment_queue.rds", timeout_min = 30) ## End(Not run)
Run one queued job (claim-next semantics) in the current R session.
tmuxRunNextWorker( queue_path, global_path, on_interrupt = c("requeue", "fail"), heartbeat_interval_s = 60, runNameLabel = quote(colnames(q)[1:2]), statusCalculate = getOption("spades.statusCalculate"), folderWithIterInFilename = getOption("spades.folderWithIterInFilename"), activeRunningPath = getOption("spades.activeRunningPath"), ss_id = NULL )tmuxRunNextWorker( queue_path, global_path, on_interrupt = c("requeue", "fail"), heartbeat_interval_s = 60, runNameLabel = quote(colnames(q)[1:2]), statusCalculate = getOption("spades.statusCalculate"), folderWithIterInFilename = getOption("spades.folderWithIterInFilename"), activeRunningPath = getOption("spades.activeRunningPath"), ss_id = NULL )
queue_path |
character; path to the queue |
global_path |
character; script to source for the job |
on_interrupt |
"requeue" or "fail". If the sourced script is interrupted, either requeue or mark as FAILED. |
heartbeat_interval_s |
numeric; seconds between heartbeats while the job runs |
runNameLabel |
A quoted expression (possibly of |
statusCalculate |
A quoted expression to compute job status from output files.
Defaults to |
folderWithIterInFilename |
A quoted expression for a folder containing iteration
info in filenames. Defaults to |
activeRunningPath |
Directory for "running" flag files. See |
ss_id |
Optional Google Sheets/Drive ID for the shared queue. When supplied workers use the GS backend instead of the local RDS file. |
"ok" | "interrupt" | "empty" (if no pending work found); used by tmuxRunWorkerLoop()
Run queued jobs repeatedly (pane-local loop).
tmuxRunWorkerLoop( queue_path, global_path, on_interrupt = c("requeue", "fail"), heartbeat_interval_s = 60, stop_file = NULL, activeRunningPath = getOption("spades.activeRunningPath"), runNameLabel = quote(colnames(q)[1:2]), ss_id = NULL, pane_mode = c("reuse", "killAndNewPane"), email = getOption("gargle_oauth_email"), cache_path = getOption("gargle_oauth_cache"), dots_path = NULL )tmuxRunWorkerLoop( queue_path, global_path, on_interrupt = c("requeue", "fail"), heartbeat_interval_s = 60, stop_file = NULL, activeRunningPath = getOption("spades.activeRunningPath"), runNameLabel = quote(colnames(q)[1:2]), ss_id = NULL, pane_mode = c("reuse", "killAndNewPane"), email = getOption("gargle_oauth_email"), cache_path = getOption("gargle_oauth_cache"), dots_path = NULL )
queue_path |
character; path to the queue |
global_path |
character; script to source for the job |
on_interrupt |
"requeue" or "fail". If the sourced script is interrupted, either requeue or mark as FAILED. |
heartbeat_interval_s |
numeric; seconds between heartbeats while the job runs |
stop_file |
optional path; if present, stop after current iteration |
activeRunningPath |
Directory for "running" flag files. See |
runNameLabel |
A quoted expression (possibly of |
ss_id |
Optional Google Sheets/Drive ID for the shared queue. When supplied workers use the GS backend instead of the local RDS file. |
pane_mode |
Character. |
email |
gargle OAuth email; forwarded to replacement panes in |
cache_path |
gargle OAuth cache path; forwarded to replacement panes. |
dots_path |
Path to |
invisibly TRUE
Sets tmux mouse mode via set-option -g mouse on/off, enabling pane
selection, resizing, and scrolling with the mouse. See tmux manual for details. 1
tmuxSetMouse(on = TRUE)tmuxSetMouse(on = TRUE)
on |
Logical; |
Invisibly returns on.
Scans every tmux server on this machine (sockets under
$TMUX_TMPDIR/tmux-<uid>/) for panes whose current title exactly matches
oldTitle, then rewrites each to newTitle. Useful for upgrading
old-style worker-pane titles (without <node>-<pid> prefix) to the new
convention so that .gs_reclaim_dead_jobs() can recognise them.
tmuxSetPaneTitle(oldTitle, newTitle)tmuxSetPaneTitle(oldTitle, newTitle)
oldTitle |
Character(1). Exact current title to match. |
newTitle |
Character(1). Replacement title. |
Invisibly, a character vector of the pane IDs that were updated
(e.g. c("%12", "%33")). Prints a message per update and a warning
when no match is found.
A set of lightweight helpers that are often not strictly necessary, but they make code easier to read.
user(username = NULL) machine(machinename = NULL) node(machinename = NULL)user(username = NULL) machine(machinename = NULL) node(machinename = NULL)
username |
A character string of a username. |
machinename |
A character string, which will be used as a partial match via
|
node is an alias for machine
if username is non-NULL, returns a logical indicating whether
the current user matches the supplied username.
Otherwise returns a character string with the value of the current user.
machine returns a logical indicating whether the current machine name
Sys.info()[["nodename"]] is matched by machinename.