lgdo.lh5 package

Routines from reading and writing LEGEND Data Objects in HDF5 files. Currently the primary on-disk format for LGDO object is LEGEND HDF5 (LH5) files. IO is done via the class store.LH5Store. LH5 files can also be browsed easily in python like any HDF5 file using h5py.

Subpackages

Submodules

lgdo.lh5.concat module

lgdo.lh5.concat._get_lgdos(file, obj_list)

Get name of LGDO objects.

lgdo.lh5.concat._get_obj_list(lh5_files, include_list=None, exclude_list=None)

Extract a list of lh5 objects to concatenate.

Parameters:
  • lh5_files (list) – list of input files to concatenate.

  • include_list (list | None) – patterns for tables to include.

  • exclude_list (list | None) – patterns for tables to exclude.

Return type:

list[str]

lgdo.lh5.concat._inplace_table_filter(name, table, obj_list)

filter objects nested in this LGDO

lgdo.lh5.concat._remove_nested_fields(lgdos, obj_list)

Remove (nested) table fields based on obj_list.

lgdo.lh5.concat.lh5concat(lh5_files, output, overwrite=False, *, include_list=None, exclude_list=None)

Concatenate LGDO Arrays, VectorOfVectors and Tables in LH5 files.

Parameters:
  • lh5_files (list) – list of input files to concatenate.

  • output (str) – path to the output file

  • include_list (list | None) – patterns for tables to include.

  • exclude_list (list | None) – patterns for tables to exclude.

lgdo.lh5.core module

lgdo.lh5.core.read(name, lh5_file, start_row=0, n_rows=9223372036854775807, idx=None, use_h5idx=False, field_mask=None, obj_buf=None, obj_buf_start=0, decompress=True, locking=False)

Read LH5 object data from a file.

Note

Use the idx parameter to read out particular rows of the data. The use_h5idx flag controls whether only those rows are read from disk or if the rows are indexed after reading the entire object. Reading individual rows can be orders of magnitude slower than reading the whole object and then indexing the desired rows. The default behavior (use_h5idx=False) is to use slightly more memory for a much faster read. See legend-pydataobj/issues/#29 for additional information.

Parameters:
  • name (str) – Name of the LH5 object to be read (including its group path).

  • lh5_file (str | Path | File | Sequence[str | Path | File]) – The file(s) containing the object to be read out. If a list of files, array-like object data will be concatenated into the output object.

  • start_row (int) – Starting entry for the object read (for array-like objects). For a list of files, only applies to the first file.

  • n_rows (int) – The maximum number of rows to read (for array-like objects). The actual number of rows read will be returned as one of the return values (see below).

  • idx (ArrayLike) – For NumPy-style “fancying indexing” for the read to select only some rows, e.g. after applying some cuts to particular columns. Only selection along the first axis is supported, so tuple arguments must be one-tuples. If n_rows is not false, idx will be truncated to n_rows before reading. To use with a list of files, can pass in a list of idx’s (one for each file) or use a long contiguous list (e.g. built from a previous identical read). If used in conjunction with start_row and n_rows, will be sliced to obey those constraints, where n_rows is interpreted as the (max) number of selected values (in idx) to be read out. Note that the use_h5idx parameter controls some behaviour of the read and that the default behavior (use_h5idx=False) prioritizes speed over a small memory penalty.

  • use_h5idx (bool) – True will directly pass the idx parameter to the underlying h5py call such that only the selected rows are read directly into memory, which conserves memory at the cost of speed. There can be a significant penalty to speed for larger files (1 - 2 orders of magnitude longer time). False (default) will read the entire object into memory before performing the indexing. The default is much faster but requires additional memory, though a relatively small amount in the typical use case. It is recommended to leave this parameter as its default.

  • field_mask (Mapping[str, bool] | Sequence[str] | None) – For tables and structs, determines which fields get read out. Nested struct elements can be accessed by using / as a separator (e.g. refer to field a inside the table table which is stored inside the struct struct as struct/table/a). If a dict is used, a default dict will be made with the default set to the opposite of the first element in the dict. This way if one specifies a few fields at False, all but those fields will be read out, while if one specifies just a few fields as True, only those fields will be read out. If a list is provided, the listed fields will be set to True, while the rest will default to False.

  • obj_buf (LGDO) – Read directly into memory provided in obj_buf. Note: the buffer will be resized to accommodate the data retrieved.

  • obj_buf_start (int) – Start location in obj_buf for read. For concatenating data to array-like objects.

  • decompress (bool) – Decompress data encoded with LGDO’s compression routines right after reading. The option has no effect on data encoded with HDF5 built-in filters, which is always decompressed upstream by HDF5.

  • locking (bool) – Lock HDF5 file while reading

Returns:

object – the read-out object

Return type:

LGDO | tuple[LGDO, int]

lgdo.lh5.core.read_as(name, lh5_file, library, **kwargs)

Read LH5 data from disk straight into a third-party data format view.

This function is nothing more than a shortcut chained call to read() and to LGDO.view_as().

Parameters:
  • name (str) – LH5 object name on disk.

  • lh5_file (str | Path | File | Sequence[str | Path | File]) – LH5 file name.

  • library (str) – string ID of the third-party data format library (np, pd, ak, etc).

Return type:

Any

See also

read, LGDO.view_as

lgdo.lh5.core.write(obj, name, lh5_file, group='/', start_row=0, n_rows=None, wo_mode='append', write_start=0, page_buffer=0, **h5py_kwargs)

Write an LGDO into an LH5 file.

If the obj LGDO has a compression attribute, its value is interpreted as the algorithm to be used to compress obj before writing to disk. The type of compression can be:

string, kwargs dictionary, hdf5plugin filter

interpreted as the name of a built-in or custom HDF5 compression filter ("gzip", "lzf", hdf5plugin filter object etc.) and passed directly to h5py.Group.create_dataset().

WaveformCodec object

If obj is a WaveformTable and obj.values holds the attribute, compress values using this algorithm. More documentation about the supported waveform compression algorithms at lgdo.compression.

If the obj LGDO has a hdf5_settings attribute holding a dictionary, it is interpreted as a list of keyword arguments to be forwarded directly to h5py.Group.create_dataset() (exactly like the first format of compression above). This is the preferred way to specify HDF5 dataset options such as chunking etc. If compression options are specified, they take precedence over those set with the compression attribute.

Note

The compression LGDO attribute takes precedence over the default HDF5 compression settings. The hdf5_settings attribute takes precedence over compression. These attributes are not written to disk.

Note

HDF5 compression is skipped for the encoded_data.flattened_data dataset of VectorOfEncodedVectors and ArrayOfEncodedEqualSizedArrays.

Parameters:
  • obj (LGDO) – LH5 object. if object is array-like, writes n_rows starting from start_row in obj.

  • name (str) – name of the object in the output HDF5 file.

  • lh5_file (str | Path | File) – HDF5 file name or h5py.File object.

  • group (str | Group) – HDF5 group name or h5py.Group object in which obj should be written.

  • start_row (int) – first row in obj to be written.

  • n_rows (int | None) – number of rows in obj to be written.

  • wo_mode (str) –

    • write_safe or w: only proceed with writing if the object does not already exist in the file.

    • append or a: append along axis 0 (the first dimension) of array-like objects and array-like subfields of structs. Scalar objects get overwritten.

    • overwrite or o: replace data in the file if present, starting from write_start. Note: overwriting with write_start = end of array is the same as append.

    • overwrite_file or of: delete file if present prior to writing to it. write_start should be 0 (its ignored).

    • append_column or ac: append fields/columns from an Struct obj (and derived types such as Table) only if there is an existing Struct in the lh5_file with the same name. If there are matching fields, it errors out. If appending to a Table and the size of the new column is different from the size of the existing table, it errors out.

  • write_start (int) – row in the output file (if already existing) to start overwriting from.

  • page_buffer (int) – enable paged aggregation with a buffer of this size in bytes Only used when creating a new file. Useful when writing a file with a large number of small datasets. This is a short-hand for (fs_stragety="page", fs_pagesize=[page_buffer])

  • **h5py_kwargs – additional keyword arguments forwarded to h5py.Group.create_dataset() to specify, for example, an HDF5 compression filter to be applied before writing non-scalar datasets. Note: `compression` Ignored if compression is specified as an `obj` attribute.

lgdo.lh5.datatype module

lgdo.lh5.datatype._lgdo_datatype_map: dict[str, LGDO] = {<class 'lgdo.types.array.Array'>: '^array<\\d+>\\{.+\\}$', <class 'lgdo.types.arrayofequalsizedarrays.ArrayOfEqualSizedArrays'>: '^array_of_equalsized_arrays<1,1>\\{.+\\}$', <class 'lgdo.types.encoded.ArrayOfEncodedEqualSizedArrays'>: '^array_of_encoded_equalsized_arrays<1,1>\\{.+\\}$', <class 'lgdo.types.encoded.VectorOfEncodedVectors'>: '^array<1>\\{encoded_array<1>\\{.+\\}\\}$', <class 'lgdo.types.fixedsizearray.FixedSizeArray'>: '^fixedsize_array<\\d+>\\{.+\\}$', <class 'lgdo.types.histogram.Histogram'>: '^struct\\{(?:binning,weights,isdensity|binning,isdensity,weights|weights,binning,isdensity|weights,isdensity,binning|isdensity,binning,weights|isdensity,weights,binning)\\}$', <class 'lgdo.types.scalar.Scalar'>: '^real$|^bool$|^complex$|^bool$|^string$', <class 'lgdo.types.struct.Struct'>: '^struct\\{.*\\}$', <class 'lgdo.types.table.Table'>: '^table\\{.*\\}$', <class 'lgdo.types.vectorofvectors.VectorOfVectors'>: '^array<1>\\{array<1>\\{.+\\}\\}$'}

Mapping between LGDO types and regular expression defining the corresponding datatype string

lgdo.lh5.datatype.datatype(expr)

Return the LGDO type corresponding to a datatype string.

Return type:

type

lgdo.lh5.datatype.get_nested_datatype_string(expr)

Matches the content of the outermost curly brackets.

Return type:

str

lgdo.lh5.datatype.get_struct_fields(expr)

Returns a list of Struct fields, given its datatype string.

Return type:

list[str]

lgdo.lh5.exceptions module

exception lgdo.lh5.exceptions.LH5DecodeError(message, file, oname=None)

Bases: Exception

exception lgdo.lh5.exceptions.LH5EncodeError(message, file, group=None, name=None)

Bases: Exception

lgdo.lh5.iterator module

class lgdo.lh5.iterator.LH5Iterator(lh5_files, groups, *, base_path='', entry_list=None, entry_mask=None, i_start=0, n_entries=None, field_mask=None, group_data=None, buffer_len='100*MB', file_cache=10, ds_map=None, friend=None, friend_prefix='', friend_suffix='', h5py_open_mode='r')

Bases: Iterator

Iterate over chunks of entries from LH5 files.

The iterator reads buffer_len entries at a time from one or more files. The LGDO instance returned at each iteration is reused to avoid reallocations, so copy the data if it should be preserved.

Examples

Iterate through a table one chunk at a time and call process on each chunk:

from lgdo.lh5 import LH5Iterator
for table in LH5Iterator("data.lh5", "geds/raw/energy", buffer_len=100):
    process(table)

LH5Iterator can also be used for random access:

it = LH5Iterator(files, groups)
table = it.read(i_entry)

In case of multiple files or an entry selection, i_entry refers to the global event index across all files.

When instantiating an iterator you must provide a list of files and the HDF5 groups to read. Optional parameters allow field masking, event selection and pairing the iterator with a “friend” iterator that is read in parallel. Several properties are available to obtain the provenance of the data currently loaded:

  • current_i_entry – index within the entry list of the first entry in the buffer

  • current_local_entries – entry numbers relative to the file the data came from

  • current_global_entries – entry number relative to the full dataset

  • current_files – file name corresponding to each entry in the buffer

  • current_groups – group name corresponding to each entry in the buffer

Constructor for LH5Iterator. Must provide a file or collection of files, and an lh5 group or collection of groups to read data from.

Collections of files and groups can be nested. At the top level, we expect the same number of entries (one set of files to one set of groups). For each corresponding pair of sets, we will loop over each pairing of a file and group, with an inner loop over the groups to minimize the opening of files. Wildcards used for files will be expanded and applied in the inner loop (i.e. each file in a wildcard will read the same groups). If groups is an un-nested collection of strings, use all groups for all files.

Examples

Read “ch1/table” and “ch2/table” from “file.lh5”:

LH5Iterator("/path/to/file.lh5", ["ch1/table", "ch2/table"])

Read “ch1” from all lh5 files in “/path1”, then read “ch1” and “ch2” from “/path2/file.lh5”, and then read “ch1/2/3” from both “file1.lh5” and “file2.lh5”:

LH5Iterator(
    ["/path1/*.lh5", "/path2/file.lh5", ["/path3/file1.lh5", "/path3/file2.lh5"]],
    ["ch1/table", ["ch1/table", "ch2/table"], ["ch1/table, "ch2/table, "ch3/table"]]
)
Parameters:
  • lh5_files (str | Collection[str | Collection[str]]) – file(s) to read from (see above). May include wildcards and environment variables.

  • groups (str | Collection[str | Collection[str]]) – HDF5 group(s) to read (see above).

  • entry_list (Collection[int] | Collection[Collection[int]]) – list of entry numbers to read. If a nested list is provided, expect one top-level list for each file, containing a list of local entries. If a list of ints is provided, use global entries.

  • entry_mask (Collection[bool] | Collection[Collection[bool]]) – mask of entries to read. If a list of arrays is provided, expect one for each file. Ignore if a selection list is provided.

  • i_start (int) – index of first entry to start at when iterating

  • n_entries (int) – number of entries to read before terminating iteration

  • field_mask (Mapping[str, bool] | Collection[str]) – mask of which fields to read. See LH5Store.read() for more details.

  • group_data (Mapping[Collection] | ak.Array) – mapping of values corresponding to each provided lh5 group. Values will be duplicated for each entry in each dataset, corresponding to the correct group, and added to the output table. This should have same structure as groups.

  • buffer_len (int) – number of entries in tables yielded by iterator. Can be provided as a value with a unit of memory; in this case, use the estimated number of rows that will yield tables that require the provided memory. Default to 100*MB.

  • file_cache (int) – maximum number of files to keep open at a time

  • ds_map (NDArray[int]) – cumulative entries in datasets corresponding to file/group pairs. This can be provided on construction to speed up random or sparse access; otherwise, we sequentially read the size of each group. WARNING: no checks for accuracy are performed so only use this if you know what you are doing!

  • friend (Collection[LH5Iterator]) – a “friend” LH5Iterator that will be joined to this one, and read in parallel. The friend should have the same length and entry list. Each iteration will return a single LH5 Table containing columns from both iterators. The buffer_len will be set to the minimum of the two.

  • friend_prefix (str) – prefix for fields in friend iterator for resolving naming conflicts

  • friend_suffix (str) – suffix for fields in friend iterator for resolving naming conflicts

  • h5py_open_mode (str) – file open mode used when acquiring file handles. r (default) opens files read-only while a allow opening files for write-appending as well.

_generate_workers(n_workers)

Create n_workers copy of this iterator, dividing the datasets (file/groups) groups between them. These are intended for parallel use

_get_ds_cumentries(i_ds)

Helper to get cumulative iterator entries in file/groups

Return type:

int

_get_ds_cumlen(i_ds)

Helper to get cumulative dataset length of file/groups

Return type:

int

_select_groups(i_beg, i_end)

Reduce list of files and groups; used by _generate_workers

add_friend(friend, prefix='', suffix='')

Add a friend which will be iterated alongside this, returning a Table joining the contents of each.

Parameters:
  • friend (LH5Iterator) – LH5Iterator to be friended to this one

  • prefix (str) – string prepended to field names; useful for disambiguating conflicts

  • suffix (str) – string appended to field names; useful for disambiguating conflicts

property buffer_len
property current_files: ndarray[tuple[Any, ...], dtype[str]]

Return list of file names for entries in buffer

property current_global_entries: ndarray[tuple[Any, ...], dtype[int]]

Return list of local file entries in buffer

property current_groups: ndarray[tuple[Any, ...], dtype[str]]

Return list of group names for entries in buffer

property current_local_entries: ndarray[tuple[Any, ...], dtype[int]]

Return list of local dataset entries in buffer

get_ds_entrylist(i_ds)

Helper to get entry list for dataset

Return type:

ndarray

get_global_entrylist()

Get global entry list, constructing it if needed

Return type:

ndarray

hist(ax, where=None, keys=None, processes=None, executor=None, **hist_kwargs)

Fill a histogram from data produced by a query selecting on where. If where is None, fill with all data fetched by iterator.

Examples

Build a 1D histogram of values in col3 with a string selection:

h = lh5_it.query(
    hist.axis.RegularAxis(100, 0, 500, label="col3")
    where = "(col1 == 0) & (col2 > 100)",
    keys = "col3"
)

Build a 2D histogram with a value axis and string-category axis after applying some processing:

def get_val(lh5_tab, lh5_it):
    ...process data
    return value, category

h = lh5_it
    [hist.axis.RegularAxis(100, 0, 500, label="Value"),
     hist.axis.StrCategory([], growth=True, label="Category")],
     where = get_val,
)
Parameters:
  • ax (Hist | axis | Collection[axis]) – hist.axis object(s) used to construct the histogram. Can provide a hist.Hist which will be filled as well.

  • where (Callable | str) –

    A filter function for selecting data entries to put into the histogram. Can be:

    • A function that returns reduced data, with signature fun(lh5_obj: Table, it: LH5Iterator). Can return:

      • numpy.ndarray: if 1D list of values; if 2D list of lists of values in same order as axes

      • Collection[ArrayLike]: return list of values in same order as axes

      • Mapping[str, ArrayLike]: mapping from axis name to values

      • pandas.DataFrame: treat as mapping from column name to values

    • A string expression. This will call lgdo.Table.eval() and return a pandas DataFrame containing all columns in the fields mask.

  • keys (Collection[str] | str) – list of keys fields corresponding to axes. Use if where returns a mapping with names different from axis names.

  • processes (Executor | int) – number of processes. If None, use number equal to threads available to executor (if provided), or else do not parallelize

  • executor (Executor) – concurrent.futures.Executor object for managing parallelism. If None, create a concurrent.futures.`ProcessPoolExecutor with number of processes equal to processes.

  • hist_kwargs – additional keyword arguments for constructing Hist. See hist.Hist.

Return type:

Hist

map(fun, aggregate=None, init=None, begin=None, terminate=None, processes=None, executor=None)

Map function over iterator blocks.

Returns order-preserving list of outputs. Can be multi-threaded provided there are no attempts to modify existing objects. Multi-threading splits the iterator into multiple independent streams with an approximately equal number of files/groups, concurrently processed under a single program multiple data model. Results will be returned asynchronously for each process.

Note: see query() and hist()

Example

Process a table and sum the products at the end:

def process(lh5_tab, lh5_it):
    ...process the table
    return result_of_processing

results = lh5_it.map(process, processes=4)
# results are an iterator over lists
result = sum(val for result in results for val in result)

Process a table as above, using aggregate to sum the results:

def process(lh5_tab, lh5_it):
    ...process the table
    return result_of_processing

result = lh5_it.map(process, aggregate=np.add, init=0, processes=4)

Process a table using a more arbitrary output:

class Result:
    def __init__(self):
        ...initialize

    @classmethod
    def process_table(tab):
        ...process the table

    def aggregate(self, result):
        ...add data from processing into object

result = lh5_it.map(
    Result.process_table,
    aggregate=Result.aggregate,
    init=Result(),
    processes=4
)
Parameters:
  • fun (Callable[Table, LH5Iterator, Any]) – function with signature fun(lh5_obj: Table, it: LH5Iterator) -> Any Outputs of function will be collected in list and returned

  • aggregate (Callable) – function used to iterably combine outputs of fun for each block of data. Should have two inputs; first input should be the type of the aggregate, and second of the type returned by fun. This function can either return the result, or perform the aggregation in-place on the first element and return None. If using multi-processing, map will return an async-iterator over the aggregated results from each process. If None, do not aggregate and instead return will iterate over result for each block.

  • init (Any) – initial value used for aggregation. If using an aggregating function and init is None, perform a deep copy of the first element

  • begin (Callable[LH5Iterator]) – function with signature fun(it: LH5Iterator) that is run before we loop through a chunk of the iterator

  • terminate (Callable[LH5Iterator]) – function with the signature fun(it: LH5Iterator) that is run after we finish looping through a chunk of the iterator

  • processes (int) – number of processes. If None, use number equal to threads available to executor (if provided), or else do not parallelize

  • executor (Executor) – concurrent.futures.Executor object for managing parallelism. If None, create a concurrent.futures.`ProcessPoolExecutor with number of processes equal to processes.

Return type:

Iterator[Any]

query(where, processes=None, executor=None, library=None)

Query the data files in the iterator

Returns the selected data as a single table in one of several formats.

Examples

Query data using a string selection:

tab = lh5_it.query("(col1 == 0) & (col2 > 100)")

Query data using a function:

def select(lh5_tab, lh5_it):
    ...process data and produce a new table
    return result

tab = lh5_it.query(select)
Parameters:
  • where (Callable | str) –

    A filter function for selecting data entries. Can be:

    • A function that returns reduced data, with signature fun(lh5_obj: Table, it: LH5Iterator). Can return:

      • numpy.ndarray: if 1D list of values; if 2D list of lists of values in same order as axes

      • Collection[ArrayLike]: return list of values in same order as axes

      • Mapping[str, ArrayLike]: mapping from axis name to values

      • pandas.DataFrame: pandas dataframe. Treat as mapping from column name to values

    • A string expression. This will call Table.eval to select events and

      return a table with all fields in the field_mask, in a data format provided with library.

  • processes (Executor | int) – number of processes. If None, use number equal to threads available to executor (if provided), or else do not parallelize

  • executor (Executor) – concurrent.futures.Executor object for managing parallelism. If None, create a concurrent.futures.`ProcessPoolExecutor with number of processes equal to processes.

  • library (str) – library to convert the columns to when using a string expression for where. See Table.eval().

read(i_entry, n_entries=None)

Read the nextlocal chunk of events, starting at entry.

Return type:

Table

reset_field_mask(mask, warn_missing=True)

Replaces the field mask of this iterator and any friends with mask.

  • If None, set this and all friends to have no mask.

  • If a collection of strings or mapping from strings to bools, set the mask for this and all friends; in the case of a conflict, use first column found. If a prefix or suffix is included for the friend, it must be included in this mask

  • If a collection of collections, use the first item to set this mask, and subsequent items to set friend masks. In this case, do not include prefixes or suffixes in names

lgdo.lh5.iterator._append_copy(list, val)

Helper for aggregating tables in query

class lgdo.lh5.iterator._hist_filler(keys)

Bases: object

Helper for filling histogram

lgdo.lh5.iterator._identity(val, _)
lgdo.lh5.iterator._map_helper(fun, aggregator, init, begin, terminate, it)

Helper for executing int, begin and terminate functions when calling map

class lgdo.lh5.iterator._table_query(expr, library)

Bases: object

Helper for when query is called on a string

expr: str
library: str

lgdo.lh5.settings module

lgdo.lh5.settings.DEFAULT_HDF5_SETTINGS: dict[str, ...] = {'compression': 'gzip', 'shuffle': True}

Global dictionary storing the default HDF5 settings for writing data to disk.

Modify this global variable before writing data to disk with this package.

Examples

>>> from lgdo import lh5
>>> lh5.DEFAULT_HDF5_SETTINGS["compression"] = "lzf"
>>> lh5.write(data, "data", "file.lh5")  # compressed with LZF
lgdo.lh5.settings.default_hdf5_settings()

Returns the HDF5 settings for writing data to disk to the pydataobj defaults.

Examples

>>> from lgdo import lh5
>>> lh5.DEFAULT_HDF5_SETTINGS["compression"] = "lzf"
>>> lh5.write(data, "data", "file.lh5")  # compressed with LZF
>>> lh5.DEFAULT_HDF5_SETTINGS = lh5.default_hdf5_settings()
>>> lh5.write(data, "data", "file.lh5", "of")  # compressed with default settings (GZIP)
Return type:

dict[str, Any]

lgdo.lh5.store module

This module implements routines from reading and writing LEGEND Data Objects in HDF5 files.

class lgdo.lh5.store.LH5Store(base_path='', keep_open=False, locking=False, default_mode='r')

Bases: object

Class to represent a store of LEGEND HDF5 files. The two main methods implemented by the class are read() and write().

Examples

>>> from lgdo import LH5Store
>>> store = LH5Store()
>>> obj, _ = store.read("/geds/waveform", "file.lh5")
>>> type(obj)
lgdo.waveformtable.WaveformTable
Parameters:
  • base_path (str | Path) – directory path to prepend to LH5 files.

  • keep_open (bool) – whether to keep files open by storing the h5py objects as class attributes. If keep_open is an int, keep only the n most recently opened files; if True, no limit

  • locking (bool) – whether to lock files when reading

  • default_mode (str) – default mode in which to open files with this LH5Store. See h5py.File documentation. If default_mode is "r", use "a" when calling LH5Store.write.

get_buffer(name, lh5_file, size=None, field_mask=None)

Returns an LH5 object appropriate for use as a pre-allocated buffer in a read loop. Sets size to size if object has a size.

Return type:

LGDO

gimme_file(lh5_file, mode=None, page_buffer=0, **file_kwargs)

Returns a h5py file object from the store or creates a new one.

Parameters:
  • lh5_file (str | Path | File) – LH5 file name.

  • mode (str) – mode in which to open file. See h5py.File documentation. If None, use default provided at construction

  • page_buffer (int) – enable paged aggregation with a buffer of this size in bytes Only used when creating a new file. Useful when writing a file with a large number of small datasets. This is a short-hand for (fs_stragety="page", fs_pagesize=[page_buffer])

  • file_kwargs – Keyword arguments for h5py.File

Return type:

File

gimme_group(group, base_group, grp_attrs=None, overwrite=False)

Returns an existing h5py group from a base group or creates a new one.

Return type:

Group

read(name, lh5_file, start_row=0, n_rows=9223372036854775807, idx=None, use_h5idx=False, field_mask=None, obj_buf=None, obj_buf_start=0, decompress=True, **file_kwargs)

Read LH5 object data from a file in the store.

See also

lh5.core.read

Return type:

tuple[LGDO, int]

read_n_rows(name, lh5_file)

Look up the number of rows in an Array-like object called name in lh5_file.

Return None if it is a Scalar or a Struct.

Return type:

int | None

read_size_in_bytes(name, lh5_file)

Look up the size (in B) of the object in memory. Will recursively crawl through all objects in a Struct or Table

Return type:

int

write(obj, name, lh5_file, group='/', start_row=0, n_rows=None, wo_mode=None, write_start=0, page_buffer=0, **h5py_kwargs)

Write an LGDO into an LH5 file.

See also

lh5.core.write

lgdo.lh5.tools module

lgdo.lh5.tools.ls(lh5_file, lh5_group='', recursive=False)

Return a list of LH5 groups in the input file and group, similar to ls or h5ls. Supports wildcards in group names.

Parameters:
  • lh5_file (str | Path | Group) – name of file.

  • lh5_group (str) – group to search. add a / to the end of the group name if you want to list all objects inside that group.

  • recursive (bool) – if True, recurse into subgroups.

Return type:

list[str]

lgdo.lh5.tools.show(lh5_file, lh5_group='/', attrs=False, indent='', header=True, depth=None, detail=False)

Print a tree of LH5 file contents with LGDO datatype.

Parameters:
  • lh5_file (str | Path | Group) – the LH5 file.

  • lh5_group (str) – print only contents of this HDF5 group.

  • attrs (bool) – print the HDF5 attributes too.

  • indent (str) – indent the diagram with this string.

  • header (bool) – print lh5_group at the top of the diagram.

  • depth (int | None) – maximum tree depth of groups to print

  • detail (bool) – whether to print additional information about how the data is stored

Examples

>>> from lgdo import show
>>> show("file.lh5", "/geds/raw")
/geds/raw
├── channel · array<1>{real}
├── energy · array<1>{real}
├── timestamp · array<1>{real}
├── waveform · table{t0,dt,values}
│   ├── dt · array<1>{real}
│   ├── t0 · array<1>{real}
│   └── values · array_of_equalsized_arrays<1,1>{real}
└── wf_std · array<1>{real}

lgdo.lh5.utils module

Implements utilities for LEGEND Data Objects.

lgdo.lh5.utils.expand_path(path, substitute=None, list=False, base_path=None)

Expand (environment) variables and wildcards to return absolute paths.

Parameters:
  • path (str | Path) – name of path, which may include environment variables and wildcards.

  • list (bool) – if True, return a list. If False, return a string; if False and a unique file is not found, raise an exception.

  • substitute (dict[str, str] | None) – use this dictionary to substitute variables. Environment variables take precedence.

  • base_path (str | Path | None) – name of base path. Returned paths will be relative to base.

Returns:

path or list of paths – Unique absolute path, or list of all absolute paths

Return type:

str | list

lgdo.lh5.utils.expand_vars(expr, substitute=None)

Expand (environment) variables.

Note

Malformed variable names and references to non-existing variables are left unchanged.

Parameters:
  • expr (str) – string expression, which may include (environment) variables prefixed by $.

  • substitute (dict[str, str] | None) – use this dictionary to substitute variables. Takes precedence over environment variables.

Return type:

str

lgdo.lh5.utils.fmtbytes(num, suffix='B')

Returns formatted f-string for printing human-readable number of bytes.

lgdo.lh5.utils.get_buffer(name, lh5_file, size=None, field_mask=None)

Returns an LGDO appropriate for use as a pre-allocated buffer.

Sets size to size if object has a size.

Return type:

LGDO

lgdo.lh5.utils.get_h5_group(group, base_group, grp_attrs=None, overwrite=False)

Returns an existing h5py group from a base group or creates a new one. Can also set (or replace) group attributes.

Parameters:
  • group (str | Group) – name of the HDF5 group.

  • base_group (Group) – HDF5 group to be used as a base.

  • grp_attrs (Mapping[str, Any] | None) – HDF5 group attributes.

  • overwrite (bool) – whether overwrite group attributes, ignored if grp_attrs is None.

Return type:

Group

lgdo.lh5.utils.read_n_rows(name, h5f)

Look up the number of rows in an Array-like LGDO object on disk.

Return None if name is a Scalar or a Struct.

Return type:

int | None

lgdo.lh5.utils.read_size_in_bytes(name, h5f)

Look up the size (in B) in an LGDO object in memory. Will crawl recursively through members of a Struct or Table

Return type:

int | None