# Experiments

While components encapsulate functionality using life cycle events, experiments specify their execution. The event paradigm of the components enables a powerful abstraction since we can compose arbitrary schedules for triggering the components life cycles. To make this more concrete, consider the following simple example:

components:
  - optimization:
      learning_rate: 0.001
      data:
        name: cifar10
        augmentation: False
      ~mnist:
        data:
          name: mnist
      ~alexnet:
        learning_rate: 0.1
        data:
          name: imagenet
import machinable as ml

experiment = ml.Experiment().component("optimization").repeat(3)
ml.execute(experiment, "~/results")

The experiment definition can be read as Import component 'optimization' and repeat its execution in three independent trials. Note that the experiment object does not trigger the execution but merely describes the execution plan and is then triggered using the execute method.

Crucially, machinable can take care of the intricacies of the execution based on this high-level description, i.e. import and construction of the components and trigger of its event life cycle. The engine can also keep track of the used configuration, generate seeds for controlled randomness and prepare a unique storage path to keep results. Since the execution details are abstracted away, it does not matter whether you run on a local computer or a distributed remote cluster. machinable comes with native support for distributed remote execution based on Ray as well as support for remote file systems like S3 store.

The experiment interface allows for dynamic configuration adjustments in a clear and systematic way. It eliminates global for-loops or manual re-runs of python scripts with different command-line parameters.

# Defining experiments

An experiment entails an arbitrary amount of components that can be added using the component method which specifies the component name as defined in the machinable.yaml.

Experiment().component('A').component('B').component('C')

TIP

Note that all experiment methods can be chained, e.g. Experiment().component('B').repeat(5).

# Combining, repeating and splitting

The components can be proliferated using the repeat() functionality, for example:

Experiment().component('A').repeat(3)
# -> [A], [A], [A]

Note that a repeat includes every components of the task and that it can be used recursively:

Experiment().component('A').component('B').repeat(2)
# -> [A, B], [A, B]
Experiment().component('A').component('B').repeat(2).repeat(2)
# -> [[A, B], [A, B]], [[A, B], [A, B]]

machinable will inject the flags REPEAT_NUMBER and REPEAT_TOTAL into each of the components accordingly. By default, the repeats are independent meaning machinable will inject a different SEED flag for each of the repeated components.

Another form of repetition is induced by the split() method that injects SPLIT_SEED, SPLIT_NUMBER and SPLIT_TOTAL flags into the components. Using the flag information, you can implement customized splitting operations. For example, to implement a cross-validation algorithm the components can split the dataset using the SPLIT_SEED and use the split that is specified by the SPLIT_NUMBER for training. As a result, the split components will conduct a k-fold cross-validation.

# Adjusting configuration

A key feature of Experiments is the programmatic adjustment of configuration; you can use experiments to capture a specific execution of components with a particular configuration -- an experiment as-they-say.

In the simplest case, you can use a dictionary to override the default component configuration as defined in the machinable.yaml.

Experiment().component('optimization', {'dataset': 'mnist', 'learning_rate': 0.5})

# Versions

Since dictionaries can be cumbersome, it is possible to pass configuration patches directly as YAML:

Experiment().component("optimization", """
learning_rate: 0.1
data:
  name: mnist
""")

# is equivalent to:

Experiment().component("optimization", {
    "learning_rate": 0.1, 
    "data": {"name": "mnist"}
})

However, rather retrieving YAML from variables, it is more suitable to define the versions directly in the machinable.yaml. To define a version, specify the configuration difference under a key that starts with ~, for instance:

~alexnet:
  learning_rate: 0.1
  data:
    name: imagenet

The version can then be accessed using its key name ~<version-name>, for example:

Experiment().component('optimization', '~alexnet')
# is equivalent to 
Experiment().component('optimization', {
  'learning_rate': 0.1, 
  'data': {'name': 'imagenet'}
})

It is also possible to reference mixins configuration using _<mixin-name>_:

Experiment().component('optimization', '_imagenet_')

You can merge and iterate over configuration adjustments using tuples and lists.

# Merging

Tuples are interpreted as a merge operators that merge the containing elements together. Consider the following example:

Experiment().component('optimization', ({'a': 1}, {'a': 2, 'b': 3}))
# is equivalent to                     ^ - merge operation ------ ^
Experiment().component('optimization', {'a': 2, 'b': 3})

# Iterating

To compare two different learning rates, you could declare the following experiment:

Experiment().component('optimization', {'learning_rate': 0.1})\
            .component('optimization', {'learning_rate': 0.5})

Since the experiment will execute every components with their adjusted configuration, the optimization will proceed with a learning rate of 0.1 and 0.5. To express these types of iterations more effectively, you can use lists to induce the same repetition as above:

Experiment().component('optimization', [{'learning_rate': lr} for lr in (0.1, 0.5)])
#                                      ^ -- list of patches induces a repeat ---- ^

# Combinations

Taking these concepts together, experiments allow you to manage complex configuration adjustments in a flexible way. Consider the following example:

Experiment().component('optimization', ('~alexnet', '~mnist', {'learning_rate': 0.5}))

This would result in the following components configuration:

learning_rate: 0.5
network: alexnet
data:
    name: mnist

Can you work out what the following experiment entails?

Experiment().component('optimization', [
   (
    '~mnist', 
    {'network': 'resnet', 'learning_rate': lr * 0.01 + 0.1}
   ) 
   for lr in range(10)
]).repeat(2)

# Other component options

In summary, the Experiment.component() method has the following signature:

(
 name,       # components name, see above
 version,    # configuration adjustment, see above
 checkpoint, # see below
 flags       # see below
)

# Checkpoints

If the checkpoint option is specified, machinable will trigger the components's on_restore event with the given filepath. This allows for restoring components from previously saved checkpoints

# Flags

In addition to the default execution flags, you can use the flags parameter to extend the flags dictionary of the components.

# Sub-components

You can organise components in a hierarchical way using the components() method which allows to add one or a list of many [sub-components]. You can specify the sub-components with the same arguments of the component() method, for example:

Experiment.components(('alexnet', {'lr': 0.1}), [('imagenet_component', {'augmentation': True})])

For a comprehensive description of the Experiment API, consult the reference.

# Executing experiments

To schedule an experiment for execution, use the execute() method.

experiment = Experiment().component('example')
execute(experiment)

machinable will generate a unique 6-digit experiment ID (e.g. OY1p1o) that will be printed at the beginning of the execution output. The ID encodes the global random seed and is used as a relative directory to write any data generated by the experiment.

# Storage

By default, the storage is the non-permanent system memory which is useful during development. To keep your results, make sure to pass in a filesystem url to the storage parameter.

execute(..., storage='~/results')         # local file system
execute(..., storage='s3://bucket')       # s3 store

# Engines

While experiments are executed locally and sequential by default, machinable provides different Engines for parallel and remote execution. For example, to execute components in parallel processes you may specify the number of processes:

execute(..., engine='native:5')

To learn more about available engines and their options, refer to the Engine section.

# Randomness and reproducibility

machinable chooses and sets a global random seed automatically at execution time. You can also determine the seed with the seed parameter by passing in a number or an experiment ID:

execute(Experiment().component('controlled_randomness'), seed=42)

To re-use the seed of a given experiment and reproduce the execution results, you can pass the experiment ID as the seed:

execute(Experiment().component('controlled_randomness'), seed='OY1p1o')

If you need more control over randomness and how packages are being seeded, you can overwrite the on_seeding event in your component class.

# Code backups

machinable automatically backs up the code base at execution time in a zip file that can be used to reproduce the results. Note that the project directory needs to be under Git-version control to determine which files are included and ignored during the backup (.gitignore file).