lgdo package

LEGEND Data Objects (LGDO) are defined in the LEGEND data format specification. This package serves as the Python implementation of that specification. The general strategy for the implementation is to dress standard Python and NumPy objects with an attr dictionary holding LGDO metadata, plus some convenience functions. The basic data object classes are:

  • LGDO: abstract base class for all LGDOs

  • Scalar: typed Python scalar. Access data via the value attribute

  • Array: basic numpy.ndarray. Access data via the nda attribute.

  • FixedSizeArray: basic numpy.ndarray. Access data via the nda attribute.

  • ArrayOfEqualSizedArrays: multi-dimensional numpy.ndarray. Access data via the nda attribute.

  • VectorOfVectors: an n-dimensional variable length array of variable length arrays. Implemented as a pair of datasets: flattened_data holding the raw data (Array or VectorOfVectors, if the vector dimension is greater than 2), and cumulative_length (always an Array) whose i-th element is the sum of the lengths of the vectors with index <= i

  • VectorOfEncodedVectors: an array of variable length encoded arrays. Implemented as a VectorOfVectors encoded_data holding the encoded vectors and an Array decoded_size specifying the size of each decoded vector. Mainly used to represent a list of compressed waveforms.

  • ArrayOfEncodedEqualSizedArrays: an array of equal sized encoded arrays. Similar to VectorOfEncodedVectors except for decoded_size, which is now a scalar.

  • Struct: a dictionary containing LGDO objects. Derives from dict

  • Table: a Struct whose elements (“columns”) are all array types with the same length (number of rows)

  • Histogram: holds an array of histogrammed data, and the associated binning of arbitrary dimensionality.

Currently the primary on-disk format for LGDO object is LEGEND HDF5 (LH5) files. IO is done via the class lh5_store.LH5Store. LH5 files can also be browsed easily in python like any HDF5 file using h5py.

Subpackages

Submodules

lgdo.cli module

legend-pydataobj’s command line interface utilities.

lgdo.cli.lh5concat_cli(args=None)

Command line interface for concatenating array-like LGDOs in LH5 files.

lgdo.cli.lh5ls(args=None)

lh5.show() command line interface.

lgdo.lgdo_utils module

lgdo.lgdo_utils.copy(obj, dtype=None)
lgdo.lgdo_utils.expand_path(path, substitute=None, list=False, base_path=None)
Return type:

str | list

lgdo.lgdo_utils.expand_vars(expr, substitute=None)
Return type:

str

lgdo.lgdo_utils.get_element_type(obj)
Return type:

str

lgdo.lgdo_utils.parse_datatype(datatype)
Return type:

tuple[str, tuple[int, …], str | list[str]]

lgdo.lh5_store module

Warning

This subpackage is deprecated, use lgdo.lh5.

class lgdo.lh5_store.LH5Iterator(lh5_files, groups, base_path='', entry_list=None, entry_mask=None, field_mask=None, buffer_len=3200, friend=None)

Bases: LH5Iterator

Warning

This class is deprecated, use lgdo.lh5.iterator.LH5Iterator.

Parameters:
  • lh5_files (str | list[str]) – file or files to read from. May include wildcards and environment variables.

  • groups (str | list[str]) – HDF5 group(s) to read. If a list of strings is provided, use same groups for each file. If a list of lists is provided, size of outer list must match size of file list, and each inner list will apply to a single file (or set of wildcarded files)

  • entry_list (list[int] | list[list[int]] | None) – list of entry numbers to read. If a nested list is provided, expect one top-level list for each file, containing a list of local entries. If a list of ints is provided, use global entries.

  • entry_mask (list[bool] | list[list[bool]] | None) – mask of entries to read. If a list of arrays is provided, expect one for each file. Ignore if a selection list is provided.

  • field_mask (dict[str, bool] | list[str] | tuple[str] | None) – mask of which fields to read. See LH5Store.read() for more details.

  • buffer_len (int) – number of entries to read at a time while iterating through files.

  • file_cache – maximum number of files to keep open at a time

  • file_map – cumulative file/group entries. This can be provided on construction to speed up random or sparse access; otherwise, we sequentially read the size of each group. WARNING: no checks for accuracy are performed so only use this if you know what you are doing!

  • friend (Iterator | None) – a “friend” LH5Iterator that will be read in parallel with this. The friend should have the same length and entry list. A single LH5 table containing columns from both iterators will be returned. Note that buffer_len will be set to the minimum of the two.

read_object(name, lh5_file, start_row=0, n_rows=9223372036854775807, idx=None, field_mask=None, obj_buf=None, obj_buf_start=0, decompress=True)

Warning

This method is deprecated, use lgdo.lh5.iterator.LH5Iterator.read().

Return type:

tuple[Array | Scalar | Struct | VectorOfVectors, int]

write_object(obj, name, lh5_file, group='/', start_row=0, n_rows=None, wo_mode='append', write_start=0, **h5py_kwargs)

Warning

This method is deprecated, use lgdo.lh5.iterator.LH5Iterator.write().

class lgdo.lh5_store.LH5Store(base_path='', keep_open=False)

Bases: LH5Store

Warning

This class is deprecated, use lgdo.lh5.iterator.LH5Store.

Parameters:
  • base_path (str) – directory path to prepend to LH5 files.

  • keep_open (bool) – whether to keep files open by storing the h5py objects as class attributes. If keep_open is an int, keep only the n most recently opened files; if True, no limit

  • locking – whether to lock files when reading

read_object(name, lh5_file, **kwargs)

Warning

This method is deprecated, use lgdo.lh5.store.LH5Store.read().

Return type:

tuple[Array | Scalar | Struct | VectorOfVectors, int]

write_object(obj, name, lh5_file, **kwargs)

Warning

This method is deprecated, use lgdo.lh5.store.LH5Store.write().

Return type:

tuple[Array | Scalar | Struct | VectorOfVectors, int]

lgdo.lh5_store.load_dfs(f_list, par_list, lh5_group='', idx_list=None)

Warning

This function is deprecated, use lgdo.types.lgdo.LGDO.view_as() to view LGDO data as a Pandas data structure.

Return type:

DataFrame

lgdo.lh5_store.load_nda(f_list, par_list, lh5_group='', idx_list=None)

Warning

This function is deprecated, use lgdo.types.lgdo.LGDO.view_as() to view LGDO data as a NumPy data structure.

Return type:

dict[str, ndarray]

lgdo.lh5_store.ls(lh5_file, lh5_group='')

Warning

This function is deprecated, import lgdo.lh5.tools.ls().

Return type:

list[str]

lgdo.lh5_store.show(lh5_file, lh5_group='/', attrs=False, indent='', header=True)

Warning

This function is deprecated, import lgdo.lh5.tools.show().

lgdo.logging module

This module implements some helpers for setting up logging.

lgdo.logging.setup(level=20, logger=None)

Setup a colorful logging output.

If logger is None, sets up only the lgdo logger.

Parameters:
  • level (int) – logging level (see logging module).

  • logger (Logger | None) – if not None, setup this logger.

Examples

>>> from lgdo import logging
>>> logging.setup(level=logging.DEBUG)

lgdo.units module

lgdo.utils module

Implements utilities for LEGEND Data Objects.

class lgdo.utils.NumbaDefaults

Bases: MutableMapping

Bare-bones class to store some Numba default options. Defaults values are set from environment variables

Examples

Set all default option values for a processor at once by expanding the provided dictionary:

>>> from numba import guvectorize
>>> from lgdo.utils import numba_defaults_kwargs as nb_kwargs
>>> @guvectorize([], "", **nb_kwargs, nopython=True) # def proc(...): ...

Customize one argument but still set defaults for the others:

>>> from lgdo.utils import numba_defaults as nb_defaults
>>> @guvectorize([], "", **nb_defaults(cache=False) # def proc(...): ...

Override global options at runtime:

>>> from lgdo.utils import numba_defaults
>>> # must set options before explicitly importing lgdo modules!
>>> numba_defaults.cache = False
>>> numba_defaults.boundscheck = True
>>> from lgdo import compression # imports of numbified functions happen here
>>> compression.encode(...)
lgdo.utils.get_element_type(obj)

Get the LGDO element type of a scalar or array.

For use in LGDO datatype attributes.

Parameters:

obj (object) – if a str, will automatically return string if the object has a numpy.dtype, that will be used for determining the element type otherwise will attempt to case the type of the object to a numpy.dtype.

Returns:

element_type – A string stating the determined element type of the object.

Return type:

str

lgdo.utils.getenv_bool(name, default=False)

Get environment value as a boolean, returning True for 1, t and true (caps-insensitive), and False for any other value and default if undefined.

Return type:

bool