Basic Use

The DataSource class is the simplest way to interact with a live swmr file. The DataSource is an iterator that provides a map of data for each frame.

The DataSource class requires 2 arguments:

  • A list of key datasets.

  • A list of datasets containing the data you wish to process.

The DataSource also has an optional timeout argument, which defaults to 10 second unless otherwise specified, and finished_dataset argument, which is a finished dataset.

The DataSource works out the dimensions of the frame (whether scalar, vector or image) by looking at the difference between the rank of the key and data datasets. It assumes that the data is written row-major and the data frames are in the fastest dimensions.

Reading Data

As an example we will create two small datasets (of the same size but containing different values) and corresponding unique key dataset to use in our example. This example shows a 2 x 2 grid scan of a detector with shape [5,10]. The keys will all be non-zero so we should expect to receive every frame of the dataset

from swmr_tools import DataSource
import h5py
import numpy as np

#Create a small dataset to extract frames from
data_1 = np.random.randint(low = -10000, high = 10000, size = (2,2,5,10))
data_2 = np.random.randint(low = -10000, high = 10000, size = (2,2,5,10))
keys_1 = np.arange(1,5).reshape(2,2,1,1)

#Save data to an hdf5 File
with h5py.File("example.h5", "w", libver = "latest") as f:
    f.create_group("keys")
    f.create_group("data")
    f["keys"].create_dataset("keys_1", data = keys_1)
    f["data"].create_dataset("data_1", data = data_1)
    f["data"].create_dataset("data_2", data = data_2)

Then we simply setup a DataSource pointing at the keys and datasets and let it run:

 with h5py.File("example.h5", "r") as f:
     keys = [f["/keys/keys_1"]]
     datasets = {"/data/data_1" : f["/data/data_1"],
                 "/data/data_2" : f["/data/data_2"]}
     ds = DataSource(keys,datasets)

     for data_map in ds:
         frame = data_map["/data/data_1"]
         print(data_map.slice_metadata)
         print(str(frame))

(slice(0, 1, None), slice(0, 1, None))
[[[[ 3980 -3645 -5966  8665   360  1863  7697  -769 -5559 -2142]
   [ 4588 -9254  8550 -1948  1172  -886  5600 -4307 -3488  2684]
   [ 6961 -6236 -4299 -7908  4577  4358 -6297 -8586 -4147 -3344]
   [ 7149 -2261  1190 -6692  -828  4310  5177 -1239  8868 -4319]
   [ 2442  5367 -1959  6815  5524 -2185 -2171 -8405 -2000 -6897]]]]
(slice(0, 1, None), slice(1, 2, None))
[[[[-4746  9432  4913 -7990 -7969   508 -4400 -4904   749 -1777]
   [-5639 -6433   214 -9282   951 -9444  3568   147 -3306  3393]
   [-9036 -9871 -9149  3938 -4487  9919  -170  5348  3916   289]
   [-3024   237  6456  8663  3531  8984 -3129  9678  3566  1306]
   [ 1891 -6206  9541 -4270 -7572 -6388 -1389  7990 -9341  8785]]]]
(slice(1, 2, None), slice(0, 1, None))
[[[[ 5964  6778 -1285 -4820  1111  5613 -3506 -2496 -6278  2581]
   [ 5037 -1065 -5667  1903  -311 -3747  1912  8773  1429   459]
   [ 4058  6380 -8450 -6520  7715  2446  8190 -6177 -9543  5414]
   [-6701  -870 -7936 -1994  9943  7053  9467 -5751 -7643  1843]
   [ 5033  4083  4520 -3509  9507  1576  9728 -1245  3678 -9098]]]]
    ...

The data (as numpy arrays) can be accessed from the data_map for each point using the dataset path as a key in the map. The slice_metadata attribute on the data_map shows the slice the data was taken from.

The slice_metadata can be used to write processed data into a new hdf5 dataset, and the DataSource class has some convenience methods to help with this.

Writing Data

The DataSource class has two methods to assist with writing processed data back into a hdf5 file:

ds.create_dataset(result_data,file_handle,hdf5_path)

which creates a new hdf5 dataset, with the correct type and shape for the result_data numpy array, and:

ds.append_data(result_data,slice_metadata,output_dataset)

which adds new result datasets into this hdf5 dataset.