While components encapsulate functionality using life cycle events, it is up to the execution to invoke the components. The event paradigm of the components allows for the composition of arbitrary component execution schedules. To make this more concrete, consider the following simple example:
from machinable import Experiment, execute execute(Experiment().component("optimization").repeat(3))
The execution definition can be read as Import component 'optimization' and repeat its execution in three independent trials. Note that the experiment object does not trigger the execution but merely describes the execution plan and is then triggered using the
Crucially, machinable can take care of the intricacies of the execution based on this high-level description, i.e. import and construction of the components and trigger of its event life cycle. It can also keep track of the used configuration, generate seeds for controlled randomness and prepare a unique storage path to keep results. Since the execution details are abstracted away, it does not matter whether you run on a local computer or a distributed remote cluster.
# Defining executions
All aspect of the execution can be controlled as arguments of the execute() method.
from machinable import execute execute( experiment, # which components with what configuration storage, # where to store results etc. engine, # execution target, e.g. remote execution, multiprocessing etc. index, # database that can be used to search for executions later project, # the machinable project to use seed # random seed )
For even finer grained control, you can instantiate the Execution object directly using the same arguments. Notably,
execute() is an alias for
For every execution, machinable will generate a unique 6-digit experiment ID (e.g.
OY1p1o) that will be printed at the beginning of the execution output. The ID encodes the global random seed and is used as a relative directory to write any data generated by the experiment.
You can specify a system-wide default for
index. Learn more
The experiment is the only required argument of the execution and specifies what components are executed. In the simplest case, it can be the name of a single component that will be executed using its default configuration. We will discuss the experiment specification in detail in the following section.
By default, the storage is the non-permanent system memory which is useful during development. To keep your results, make sure to pass in a filesystem url to the
execute(..., storage='~/results') # local file system execute(..., storage='s3://bucket') # s3 store
While experiments are executed locally and sequential by default, machinable provides different Engines for parallel and remote execution. For example, to execute components in parallel processes you may specify the number of processes:
To learn more about available engines and their options, refer to the Engine section.
# Seeding and reproducibility
machinable chooses and sets a global random seed automatically at execution time. You can also determine the seed with the
seed parameter by passing in a number or an experiment ID:
To re-use the seed of a given experiment and reproduce the execution results, you can pass the experiment ID as the seed:
If you need more control over randomness and how packages are being seeded, you can overwrite the
on_seeding event in your component class.
# Code backups
machinable automatically backs up the code base at execution time in a zip file that can be used to reproduce the results. Note that the project directory needs to be under Git-version control to determine which files are included and ignored during the backup (
# Import arguments and CLI
It is often helpful to move frequently used execution arguments into modules, for example:
# ./experiments/baseline.py from machinable import Experiment baseline_experiment = Experiment().component('example')
# ./engines/remote.py from machinable.engine import Remote my_remote_execution_engine = Remote(host="ssh://remote", directory="~/project")
# ./main.py from machinable import execute from experiments.baseline import baseline_experiment from engines.remote import my_remote_execution_engine execute(baseline_experiment, engine=my_remote_execution_engine)
You can simplify such imports by passing the module path prefixed with
@/ as an execution argument, for instance:
from machinable import execute execute("@/experiments/baseline", engine="@/engines/remote")
Note that you do not need to specify the actual variable name (e.g.
baseline_experiment) since machinable will search the module for instances automatically.
# ./experiments/baseline.py from machinable import Experiment # no need to assign the experiment to a variable Experiment().component('example')
If the module contains more than one instance, only the last one will be returned.
As a further simplification, using a simple
@ will instruct machinable to search in the following default modules.
from machinable import execute # @baseline -> @/experiments/baseline # @remote -> @/engines/remote execute("@baseline", engine="@remote")
@-notation is particularly useful when used in combination with the command line interface, as it allows you to specify complex arguments in a concise way.
$ machinable execute @baseline --engine @remote
With the basic execution concepts out of the way, the following sections will focus on the fundamental
engine arguments in more detail.