Skip to content
On this page

Storage and retrieval

When you execute experiments, information like the used configuration, the used random seed, etc. are automatically captured and stored.

One of the fundamental ideas in the design of machinable's storage is that it allows to retrieve results through the same abstraction that was used to create them. What does this look like for an experiment? Consider the example experiment that computes a gravity estimation of an exoplanet.

python
from machinable import get

gravity = get('estimate_gravity', {'time_dilation': 2.0})

Here, get() will automatically search the storage for an experiment of type estimate_gravity with a time_dilation of 2.0. If estimate_gravity has not been executed with this exact configuration, a new instance of the experiment with time_dilation=2.0 is returned instead. This means that we can easily retrieve experiments with the same command we initially used to execute them. Consider the following example:

python
from machinable import get

gravity = get('estimate_gravity', {'time_dilation': 0.5})

if not gravity.is_finished():
  print("An experiment with this configuration was not found")
else:
  print(f"The gravity for a time dilation of 0.5 is {gravity.result}")

By default, the experiment data is stored in a local folder at ./storage/{experiment_id} (e.g. ./storage/MHCYZq).

Configuring the storage

Just like with experiments and execution, you can choose the storage implementation and configuration using the module convention:

python
from machinable import get

storage = get('machinable.storage.filesystem', {
  'directory': './my-storage'
})

This will instantiate the Storage implementation that is located in the machinable.storage.filesystem module, namely the default storage that writes all data to a local directory.

To use the storage, wrap the relevant code in a with-context:

python
with storage:
  experiment.launch()

# or alternatively
storage.__enter__()

experiment.launch()

Experiments within the context will be written to the specified directory ./my-storage. You are free to use or implement alternative storage that may upload files to the cloud or into a database.

Saving and loading data

While machinable automatically stores crucial information about the experiment, you can use Experiment.save_data() and Experiment.load_data() to easily store and retrieve additional custom data in different file formats:

python
gravity.save_data('prediction.txt', 'a string')           # text
gravity.save_data('settings.json', {'neurons': [1, 2]})   # jsonable
gravity.save_data('inputs.npy', np.array([1.0, 2.0]))     # numpy
gravity.save_data('results.p', results)                   # pickled

>>> gravity.load_data('prediction.txt')
'a string'

Records

Experiments also provide an interface for tabular logging, that is, storing recurring data points for different iterations.

python
record = gravity.record()

for iteration in range(3):
    record['iteration'] = iteration
    record['accuracy'] = 0.1
    # save at the end of the iteration to start a new row
    record.save()

The results become available as a table where each row represents an iteration.

>>> gravity.records().as_table()
2022-10-07T23:05:46.942295-05:000.10
2022-10-07T23:05:46.944064-05:000.11
2022-10-07T23:05:46.946012-05:000.12

Organize using groups

To keep things organized, you can group experiments that belong together, for example:

python
from machinable import get

experiment = get('estimate_gravity')

experiment.group_as('lab-reports/%Y')

>>> experiment.group
'Group [lab-reports/2023]'

TIP

When specifying groups, you can use the common time format codes like %Y for the year provided by datetime.date.strftime.

You may also specify a global default group that will be used if no group is set.

python
from machinable import get

get(
  'machinable.storage.filesystem',
  {'directory': './my-storage'},
  default_group='lab-reports/%Y'
).__enter__()

Note

The pre-configured default group is %Y_%U_%a, e.g. 2022_40_Sun

machinable does not determine what interface you may like to use to query and search experiments. You can implement your custom storage search routines and resort to third-party UIs or libraries that suit your needs.

To illustrate this, let's leverage the library PyFunctional to search our storage.

python
>>> from machinable import Storage
>>> storage = Storage.get()
>>> storage
'FilesystemStorage <./storage>'

The filesystem storage provides an SQlite database which we can use as a data source:

python
>>> storage.sqlite_file
'./storage/storage.sqlite'
>>> from functional import seq
>>> from functools import partial
>>> query = partial(seq.sqlite3, storage.sqlite_file)

This allows us to run arbitrary SQL queries like retrieving the 3 most recent experiments:

python
>>> recent = query("SELECT * FROM experiments ORDER BY timestamp LIMIT 3").map(lambda x: x[1])
>>> recent
['./storage/oPqe2v', './storage/HGtHQu', './storage/HwVO9l']

Once we find what we are looking for we can always convert back to the regular machinable abstractions using from_storage:

python
>>> from machinable import Experiment
>>> experiments = recent.map(Experiment.from_storage).to_list()
>>> experiments
[Experiment [oPqe2v], Experiment [HGtHQu], Experiment [HwVO9l]]
>>> experiments[0].finished_at().humanize()
'a month ago'

Overall, this should allow for a seamless conversion and integration of your preferred experiment management tools.

🧑‍🎓

This concludes the overview of the most essential features. You can refer back to individual chapters at any time or continue with the advanced tutorial sections, the API reference and the examples.

MIT Licensed