lgdo.types package

LEGEND Data Objects (LGDO) types.

Submodules

lgdo.types.array module

Implements a LEGEND Data Object representing an n-dimensional array and corresponding utilities.

class lgdo.types.array.Array(*_args, **_kwargs)

Bases: LGDOCollection

Holds an numpy.ndarray and attributes.

Array (and the other various array types) holds an nda instead of deriving from numpy.ndarray for the following reasons:

  • It keeps management of the nda totally under the control of the user. The user can point it to another object’s buffer, grab the nda and toss the Array, etc.

  • It allows the management code to send just the nda’s the central routines for data manpulation. Keeping LGDO’s out of that code allows for more standard, reusable, and (we expect) performant Python.

  • It allows the first axis of the nda to be treated as “special” for storage in Tables.

Parameters:
  • nda (np.ndarray | ak.Array | None) – An numpy.ndarray or ak.Array to be used for this object’s internal array. If the Awkward array carries a units parameter, it will forwarded as LGDO attribute.

  • shape (tuple[int, ...]) – A numpy-format shape specification for shape of the internal ndarray. Required if nda is None, otherwise unused.

  • dtype (np.dtype | None) – Specifies the type of the data in the array. Required if nda is None, otherwise unused.

  • fill_val (float | int | None) – If None, memory is allocated without initialization. Otherwise, the array is allocated with all elements set to the corresponding fill value. If nda is not None, this parameter is ignored.

  • attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO. These attributes have always precedence over all the others (e.g. those carried by nda).

Warning

This constructor has partial units support. It supports fishing units from Awkward Array parameters but not (yet) from e.g. NumPy+Pint arrays. In any case, the user can always attach units later by modifying the dictionary held by attrs.

Note

The array is used directly, not copied. If not supplied, internal memory is newly allocated based on the shape and dtype arguments.

append(value)

Append value(s) to end of array (with copy), resizing array

datatype_name()

The name for this LGDO’s datatype attribute.

Return type:

str

property dtype
form_datatype()

Return this LGDO’s datatype attribute string.

Return type:

str

get_capacity()

Get capacity (i.e. max size before memory must be re-allocated)

Return type:

int

insert(i, value)

Insert value(s) into row i (with copy). If collection of values are provided insert all values. Array will be resized and subsequent values will be shifted.

property nda
replace(i, value)

Replace value at row i

reserve_capacity(capacity)

Set size (number of rows) of internal memory buffer

resize(new_size, trim=False)

Set size of Array in rows. Only change capacity if it must be increased to accommodate new rows; in this case double capacity. If trim is True, capacity will be set to match size. If new_size is an int, do not change size of inner dimensions.

If new_size is a collection, internal memory will be re-allocated, so this should be done only rarely!

Parameters:
  • new_size (int | Collection[int]) – number of rows after resize. If provided as a collection, also resize inner dimensions

  • trim – after resizing, change capacity to match size

property shape
trim_capacity()

Set capacity to be minimum needed to support Array size

view_as(library, with_units=False)

View the Array data as a third-party format data structure.

This is a zero-copy operation. Supported third-party formats are:

Parameters:
  • library (str) – format of the returned data view.

  • with_units (bool) – forward physical units to the output data.

Return type:

pd.DataFrame | np.NDArray | ak.Array

See also

LGDO.view_as

lgdo.types.arrayofdetectorids module

Implements a LEGEND Data Object representing an array of detector IDs.

class lgdo.types.arrayofdetectorids.ArrayOfDetectorIDs(*_args, **_kwargs)

Bases: Array

Array of detector IDs, which are uint32 values encoding the name of a detector in the LEGEND experiment. See Detector ID Encoding

See Array

Parameters:
  • nda (np.ndarray | ak.Array | None) – An numpy.ndarray or ak.Array to be used for this object’s internal array. If the Awkward array carries a units parameter, it will be forwarded as LGDO attribute.

  • shape (tuple[int, ...]) – A numpy-format shape specification for shape of the internal ndarray. Required if nda is None, otherwise unused.

  • fill_val (int | None) – If None, memory is allocated without initialization. Otherwise, the array is allocated with all elements set to the corresponding fill value. If nda is not None, this parameter is ignored.

  • attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO. These attributes have always precedence over all the others (e.g. those carried by nda).

form_datatype()

Return this LGDO’s datatype attribute string.

Return type:

str

lgdo.types.arrayofequalsizedarrays module

Implements a LEGEND Data Object representing an array of equal-sized arrays and corresponding utilities.

class lgdo.types.arrayofequalsizedarrays.ArrayOfEqualSizedArrays(*_args, **_kwargs)

Bases: Array

An array of equal-sized arrays.

Arrays of equal size within a file but could be different from application to application. Canonical example: array of same-length waveforms.

Parameters:
  • dims (tuple[int, ...] | None) – specifies the dimensions required for building the ArrayOfEqualSizedArraysdatatype attribute.

  • nda (np.ndarray) – An numpy.ndarray to be used for this object’s internal array. Note: the array is used directly, not copied. If not supplied, internal memory is newly allocated based on the shape and dtype arguments.

  • shape (tuple[int, ...]) – A NumPy-format shape specification for shape of the internal array. Required if nda is None, otherwise unused.

  • dtype (np.dtype) – Specifies the type of the data in the array. Required if nda is None, otherwise unused.

  • fill_val (int | float | None) – If None, memory is allocated without initialization. Otherwise, the array is allocated with all elements set to the corresponding fill value. If nda is not None, this parameter is ignored.

  • attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO.

Notes

If shape is not “1D array of arrays of shape given by axes 1-N” (of nda) then specify the dimensionality split in the constructor.

See also

Array

datatype_name()

The name for this LGDO’s datatype attribute.

Return type:

str

form_datatype()

Return this LGDO’s datatype attribute string.

Return type:

str

to_vov(cumulative_length=None)

Convert (and eventually resize) to vectorofvectors.VectorOfVectors.

Parameters:

cumulative_length (ndarray) – cumulative length array of the output vector of vectors. Each vector in the output is filled with values found in the ArrayOfEqualSizedArrays, starting from the first index. if None, use all of the original 2D array and make vectors of equal size.

Return type:

VectorOfVectors

view_as(library, with_units=False)

View the array as a third-party format data structure.

See also

LGDO.view_as

Return type:

pd.DataFrame | np.NDArray | ak.Array

lgdo.types.arrow module

Zero-copy conversion between LGDO and Arrow types.

Note on allocation: we use to_numpy(zero_copy_only=False) throughout. PyArrow performs zero-copy for all numeric types and only allocates when it must (e.g. booleans, which are bit-packed in Arrow but byte-packed in NumPy, or columns containing nulls that need sentinel values). Multi-chunk columns are combined automatically with a warning.

lgdo.types.arrow._arrow_col_to_lgdo(col, field)

Convert Arrow array to LGDO column (zero-copy).

StructArrays whose fields are {t0, dt, values} become WaveformTables; other StructArrays become plain Tables.

lgdo.types.arrow._deserialize_attr(raw)

Deserialize an Arrow metadata value back to a Python object.

lgdo.types.arrow._lgdo_col_to_arrow(col)

Convert single LGDO column to Arrow array.

Tables (including WaveformTable) become StructArrays with child field metadata preserving attrs like units.

Return type:

Array

lgdo.types.arrow._nested_fixed_list_to_nda(arr)

Convert nested Arrow fixed_size_list to N-D numpy array.

Return type:

ndarray

lgdo.types.arrow._serialize_attr(value)

Serialize an attr value to a JSON string for Arrow metadata.

Return type:

str

lgdo.types.arrow.arrow_to_lgdo(obj)

Convert an Arrow object to its LGDO equivalent.

Type mapping:

Arrow type

LGDO type

pa.Table

Table

StructArray with {t0, dt, values}

WaveformTable

StructArray (other)

Table

FixedSizeListArray

ArrayOfEqualSizedArrays

ListArray

VectorOfVectors

primitive Array

Array

Zero-copy where possible. Multi-chunk columns are combined automatically (with a warning, since this allocates).

Parameters:

obj – Any supported Arrow object.

Returns:

Table, WaveformTable, Array, ArrayOfEqualSizedArrays, or VectorOfVectors

lgdo.types.arrow.lgdo_to_arrow(obj)

Convert an LGDO object to its Arrow equivalent.

Type mapping:

LGDO type

Arrow type

Table

pa.Table

WaveformTable

pa.StructArray

Array

pa.Array

ArrayOfEqualSizedArrays

pa.FixedSizeListArray

VectorOfVectors

pa.ListArray

Preserves all attrs as JSON-encoded Arrow field metadata.

Parameters:

obj – Any supported LGDO object.

Returns:

pa.Table or pa.Array – Arrow table (for Table) or Arrow array (for all other types).

Return type:

Table | Array

lgdo.types.encoded module

class lgdo.types.encoded.ArrayOfEncodedEqualSizedArrays(*_args, **_kwargs)

Bases: LGDOCollection

An array of encoded arrays with equal decoded size.

Used to represent an encoded ArrayOfEqualSizedArrays. In addition to an internal VectorOfVectors self.encoded_data storing the encoded data, the size of the decoded arrays is stored in a Scalar self.encoded_size.

Parameters:
  • encoded_data (VectorOfVectors) – the vector of vectors holding the encoded data.

  • decoded_size (Scalar | int) – the length of the decoded arrays.

  • attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO. Should include information about the codec used to encode the data.

append(value)

Append a 1D encoded array at the end.

datatype_name()

The name for this LGDO’s datatype attribute.

Return type:

str

form_datatype()

Return this LGDO’s datatype attribute string.

Return type:

str

get_capacity()

get reserved capacity of internal memory buffers in rows

Return type:

tuple

insert(i, value)

Insert an encoded array at index i.

replace(i, value)

Replace the encoded array at index i with a new one.

reserve_capacity(*capacity)

Reserve capacity (in rows) for later use. Internal memory buffers will have enough entries to store this many rows.

resize(new_size, trim=False)

Resize array along the first axis.

trim_capacity()

set capacity to only what is required to store current contents of LGDOCollection

view_as(library, with_units=False)

View the encoded data as a third-party format data structure.

This is nearly a zero-copy operation.

Supported third-party formats are:

Note

In the view, decoded_size is expanded into an array.

Parameters:
  • library (str) – format of the returned data view.

  • with_units (bool) – forward physical units to the output data.

Return type:

pd.DataFrame | np.NDArray | ak.Array

See also

LGDO.view_as

class lgdo.types.encoded.VectorOfEncodedVectors(*_args, **_kwargs)

Bases: LGDOCollection

An array of variable-length encoded arrays.

Used to represent an encoded VectorOfVectors. In addition to an internal VectorOfVectors self.encoded_data storing the encoded data, a 1D Array in self.encoded_size holds the original sizes of the encoded vectors.

See also

VectorOfVectors

Parameters:
  • encoded_data (VectorOfVectors) – the vector of encoded vectors.

  • decoded_size (Array) – an array holding the original length of each encoded vector in encoded_data.

  • attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO. Should include information about the codec used to encode the data.

datatype_name()

The name for this LGDO’s datatype attribute.

Return type:

str

form_datatype()

Return this LGDO’s datatype attribute string.

Return type:

str

get_capacity()

get reserved capacity of internal memory buffers in rows

Return type:

tuple

insert(i, value)

Insert an encoded vector at index i.

Parameters:
  • i (int) – the new vector will be inserted before this index.

  • value (tuple[ndarray[tuple[Any, ...], dtype[_ScalarT]], int]) – a tuple holding the encoded array and its decoded size.

replace(i, value)

Replace the encoded vector (and decoded size) at index i with a new one.

Parameters:
  • i (int) – index of the vector to be replaced.

  • value (tuple[ndarray[tuple[Any, ...], dtype[_ScalarT]], int]) – a tuple holding the encoded array and its decoded size.

reserve_capacity(*capacity)

Reserve capacity (in rows) for later use. Internal memory buffers will have enough entries to store this many rows.

resize(new_size)

Resize vector along the first axis.

trim_capacity()

set capacity to only what is required to store current contents of LGDOCollection

view_as(library, with_units=False)

View the encoded data as a third-party format data structure.

This is a zero-copy or nearly zero-copy operation.

Supported third-party formats are:

Parameters:
  • library (str) – format of the returned data view.

  • with_units (bool) – forward physical units to the output data.

Return type:

pd.DataFrame | np.NDArray | ak.Array

See also

LGDO.view_as

lgdo.types.fixedsizearray module

Implements a LEGEND Data Object representing an n-dimensional array of fixed size and corresponding utilities.

class lgdo.types.fixedsizearray.FixedSizeArray(*_args, **_kwargs)

Bases: Array

An array of fixed-size arrays.

Arrays with guaranteed shape along axes > 0: for example, an array of vectors will always length 3 on axis 1, and it will never change from application to application. This data type is used for optimized memory handling on some platforms. We are not that sophisticated so we are just storing this identification for LGDO validity, i.e. for now this class is just an alias for Array, but keeps track of the datatype name.

See also

Array

datatype_name()

The name for this LGDO’s datatype attribute.

Return type:

str

view_as(library, with_units=False)

View the array as a third-party format data structure.

See also

LGDO.view_as

lgdo.types.histogram module

class lgdo.types.histogram.Histogram(*args, **kwargs)

Bases: Struct

A special struct to contain histogrammed data.

Parameters:
  • weights (hist.Hist | NDArray | Array) – An numpy.ndarray to be used for this object’s internal array, or a hist.Hist object, whose data view is used for this object’s internal array. Note: the array/histogram view is used directly, not copied

  • binning (None | Iterable[Histogram.Axis] | Iterable[NDArray] | Iterable[tuple[float, float, float]]) –

    • has to by None if a hist.Hist has been passed as weights

    • can be a list of pre-initialized Histogram.Axis

    • can be a list of tuples, each representing a range, (first, last, step)

    • can be a list of numpy arrays, as returned by numpy.histogramdd().

  • isdensity (bool) – If True, all bin contents represent a density (amount per volume), and not an absolute amount.

  • binedge_attrs (dict[str, Any] | None) – attributes that will be added to the all binedges of all axes. This does not work if Histogram.Axis instances are directly passed as binning.

  • attrs (dict[str, Any] | None) – a set of user attributes to be carried along with this LGDO.

  • flow (bool) –

    If False, discard counts in over-/underflow bins of the passed hist.Hist instance. If True, this data will also be discarded, but a warning is emitted.

    Note

    Histogram does not support storing counts in overflow or underflow bins. This parameter just controls, whether a warning will be emitted.

class Axis(*args, **kwargs)

Bases: Struct

A special struct to group axis parameters for use in a Histogram.

Depending on the parameters, an axis either can have

  • a binning described by a range object, if first, last and step are passed, or

  • a variable binning described by the edges array.

Parameters:
  • edges (NDArray | Array | None) – an array of edges that describe the binning of this axis.

  • first (float | None) – left edge of the leftmost bin

  • last (float | None) – right edge of the rightmost bin

  • step (float | None) – step size (width of each bin)

  • closedleft (bool) – if True, the bin intervals are left-closed \([a,b)\); if False, intervals are right-closed \((a,b]\).

  • binedge_attrs (dict[str, Any] | None) – attributes that will be added to the binedges LGDO that is part of the axis struct.

property closedleft: bool
property edges: ndarray[tuple[Any, ...], dtype[_ScalarT]]

Return all binedges, both for variable and range binning.

property first: float
classmethod from_edges(edges, binedge_attrs=None)

Create a new axis with variable binning described by edges.

Return type:

Axis

classmethod from_range_edges(edges, binedge_attrs=None)

Create a new axis from the binning described by edges, but try to convert it to a evenly-spaced range object first.

Warning

This function might return a wrong binning, especially in the case of very small magnitudes of the spacing. See the documentation of numpy.isclose() for details. Use this function only with caution, if you know the binning’s order of magniutude.

Return type:

Axis

get_binedgeattrs(datatype=False)

Return a copy of the LGDO attributes dictionary of the binedges

Parameters:

datatype (bool) – if False, remove datatype attribute from the output dictionary.

Return type:

dict

property is_range: bool
property last: float
property nbins: int

Return the number of bins, both for variable and range binning.

property step: float
add_field(name, obj)

Error

Not applicable: A histogram cannot be used as a struct

property binning: tuple[Axis, ...]
fill(data, w=None, keys=None)

Fill histogram by incrementing bins with data points weighted by w

Parameters:
  • data – a ndarray with inner dimension equal to number of axes, or a list of equal-length 1d-arrays containing data for each axis, or a Mapping to 1d-arrays containing data for each axis (requires keys), or a Pandas dataframe (optionally takes a list of keys)

  • w (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – weight to use for incrementing data points. If None, use 1 for all

  • keys (Sequence[str]) – list of keys to use if data is a pandas ‘’DataFrame’’ or ‘’Mapping’’

property isdensity: bool
remove_field(name, delete=False)

Error

Not applicable: A histogram cannot be used as a struct

view_as(library)

View the histogram data as a third-party format data structure.

This is typically a zero-copy or nearly zero-copy operation.

Supported third-party formats are:

  • np: returns a tuple of binning and an np.ndarray, similar to the return value of numpy.histogramdd().

  • hist: returns an hist.Hist that holds a copy of this histogram’s data.

Warning

Viewing as hist will perform a copy of the stored histogram data.

Parameters:

library (str) – format of the returned data view.

Return type:

tuple[ndarray[tuple[Any, …], dtype[_ScalarT]]] | Hist

See also

LGDO.view_as

property weights: Array

lgdo.types.lgdo module

class lgdo.types.lgdo.LGDO(*_args, **_kwargs)

Bases: ABC

Abstract base class representing a LEGEND Data Object (LGDO).

abstractmethod datatype_name()

The name for this LGDO’s datatype attribute.

Return type:

str

abstractmethod form_datatype()

Return this LGDO’s datatype attribute string.

Return type:

str

getattrs(datatype=False)

Return a copy of the LGDO attributes dictionary.

Parameters:

datatype (bool) – if False, remove datatype attribute from the output dictionary.

Return type:

dict

abstractmethod view_as(library, with_units=False)

View the LGDO data object as a third-party format data structure.

This is typically a zero-copy or nearly zero-copy operation unless explicitly stated in the concrete LGDO documentation. The view can be turned into a copy explicitly by the user with the appropriate methods. If requested by the user, the output format supports it and the LGDO carries a units attribute, physical units are attached to the view through the pint package.

Typical supported third-party libraries are:

Note

Awkward does not support attaching units through Pint, at the moment.

Note

The arrow format uses zero-copy for numeric arrays. Booleans (bit-packed in Arrow, byte-packed in NumPy) and null-containing columns require allocation. WaveformTable t0 and dt arrays are copied to ensure writability, as required by dspeed.build_processing_chain(). All attrs are preserved as JSON-encoded Arrow field metadata.

but the actual supported formats may vary depending on the concrete LGDO class.

Parameters:
  • library (str) – format of the returned data view.

  • with_units (bool) – forward physical units to the output data.

Return type:

pd.DataFrame | np.NDArray | ak.Array

class lgdo.types.lgdo.LGDOCollection(*_args, **_kwargs)

Bases: LGDO

Abstract base class representing a LEGEND Collection Object (LGDO). This defines the interface for classes used as table columns.

append(val)

append val to end of LGDOCollection

clear(trim=False)

set size of LGDOCollection to zero

abstractmethod get_capacity()

get reserved capacity of internal memory buffers in rows

Return type:

int

abstractmethod insert(i, val)

insert val into LGDOCollection at position i

abstractmethod replace(i, val)

replace item at position i with val in LGDOCollection

abstractmethod reserve_capacity(capacity)

Reserve capacity (in rows) for later use. Internal memory buffers will have enough entries to store this many rows.

abstractmethod resize(new_size, trim=False)

Return this LGDO’s datatype attribute string.

abstractmethod trim_capacity()

set capacity to only what is required to store current contents of LGDOCollection

lgdo.types.scalar module

Implements a LEGEND Data Object representing a scalar and corresponding utilities.

class lgdo.types.scalar.Scalar(*_args, **_kwargs)

Bases: LGDO

Holds just a scalar value and some attributes (datatype, units, …).

Parameters:
  • value (int | float | str) – the value for this scalar.

  • attrs (dict[str, Any] | None) – a set of user attributes to be carried along with this LGDO.

datatype_name()

The name for this LGDO’s datatype attribute.

Return type:

str

form_datatype()

Return this LGDO’s datatype attribute string.

Return type:

str

view_as(with_units=False)

Dummy function, returns the scalar value itself.

See also

LGDO.view_as

lgdo.types.struct module

Implements a LEGEND Data Object representing a struct and corresponding utilities.

class lgdo.types.struct.Struct(*args, **kwargs)

Bases: LGDO, MutableMapping

A dictionary of LGDO’s with an optional set of attributes.

After instantiation, add fields using add_field() to keep the datatype updated, or call update_datatype() after adding.

Parameters:
  • obj_dict – instantiate this Struct using the supplied named LGDO’s. Note: no copy is performed, the objects are used directly.

  • attrs – a set of user attributes to be carried along with this LGDO.

Return type:

Struct

add_field(name, obj)

Add a field to the table or set an existing field.

Parameters:
  • name (str | int) – key to use for field. Key can be nested (e.g. name1.name2 or name1/name2); this will navigate through the tree, creating new fields as needed

  • obj (LGDO | Mapping[str, LGDO]) – object to add. Can be any LGDO object, or a mapping from names to LGDO objects that will be converted to an LGDO Struct

datatype_name()

The name for this LGDO’s datatype attribute.

Return type:

str

form_datatype()

Return this LGDO’s datatype attribute string.

Return type:

str

items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
remove_field(name, delete=False)

Remove a field from the table.

Parameters:
  • name (str | int) – name of the field to be removed.

  • delete (bool) – if True, delete the field object by calling The del statement.

update(other=(), /, **kwargs)

Add or set a field(s) to the table or set an existing field. For nested Structs, only update at the lowest level of nesting; unlike for nested dicts, nested fields not included in other will not be removed.

Parameters:

other (Struct | Mapping[str, LGDO] | Iterable[str, LGDO]) – Struct/Mapping from fields to new values

update_datatype()
values() an object providing a view on D's values
view_as()

View the Struct data as a third-party format data structure.

Error

Not implemented. Since Struct’s fields can have different lengths, converting to a NumPy, Pandas or Awkward is generally not possible. Call LGDO.view_as() on the fields instead.

See also

LGDO.view_as

lgdo.types.struct._get_struct_fields(expr)
Return type:

list[str]

lgdo.types.struct._is_struct_datatype(dt_name, expr)
lgdo.types.struct._sort_datatype_fields(expr)
lgdo.types.struct._struct_datatype_equal(dt_name, dt1, dt2)

lgdo.types.table module

Implements a LEGEND Data Object representing a special struct of arrays of equal length and corresponding utilities.

class lgdo.types.table.Table(*args, **kwargs)

Bases: Struct, LGDOCollection

A special struct of arrays or subtable columns of equal length.

Note

If you write to a table and don’t fill it up to its total size, be sure to resize it before passing to data processing functions, as they will call __len__() to access valid data, which returns the size attribute.

Parameters:
  • size – sets the number of rows in the table. Arrays in col_dict will be resized to match size if both are not ``None`. If size is left as None, the number of table rows is determined from the length of the first array in col_dict. If neither is provided, a default length of 1024 is used.

  • col_dict – instantiate this table using the supplied mapping of column names and array-like objects. Supported input types are: mapping of strings to LGDOCollections, pd.DataFrame and ak.Array.

  • attrs – A set of user attributes to be carried along with this LGDO.

Notes

  • The loc attribute is initialized to 0.

  • No copy is performed, the objects are used directly (unless ak.Array is provided).

  • If size is not None, all arrays will be resized to match it.

  • If the arrays have different lengths, all will be resized to match the length of the first array.

Warning

This constructor has partial units support. It supports fishing units from Awkward Array parameters but not (yet) from e.g. NumPy+Pint arrays. In any case, the user can always attach units later by modifying the dictionary held by attrs.

add_column(name, obj, use_obj_size=False)

Alias for add_field() using table terminology ‘column’.

add_field(name, obj, use_obj_size=False)

Add a field (column) to the table.

Use the name “field” here to match the terminology used in Struct.

Parameters:
  • name (str) – key to use for field. Key can be nested (e.g. name1.name2 or name1/name2); this will navigate through the tree, creating new fields as needed

  • obj (LGDOCollection | Mapping[str, LGDOCollection]) – object to add. Can be any LGDOCollection, or a mapping from names to LGDOCollections that will be converted to an LGDO Table. Size of obj should match size of this Table

  • use_obj_size (bool) – if True, resize the table to match the length of obj.

datatype_name()

The name for this LGDO’s datatype attribute.

Return type:

str

eval(expr, parameters=None, modules=None, with_units=False, library=None)

Apply column operations to the table and return a new LGDO.

Internally uses numexpr.evaluate() if dealing with columns representable as NumPy arrays or eval() if VectorOfVectors are involved. In the latter case, the VoV columns are viewed as ak.Array and the respective routines are therefore available.

To columns nested in subtables can be accessed by scoping with two underscores (__). For example:

tbl.eval("a + tbl2__b")

computes the sum of column a and column b in the subtable tbl2.

Parameters:
  • expr (str) – if the expression only involves non-VectorOfVectors columns, the syntax is the one supported by numexpr.evaluate() (see here for documentation). Note: because of internal limitations, reduction operations must appear the last in the stack. If at least one considered column is a VectorOfVectors, plain eval() is used and ak.Array transforms can be used through the ak. prefix. (NumPy functions are analogously accessible through np.). See also examples below.

  • parameters (Mapping[str, str] | None) – a dictionary of function parameters. Passed to numexpr.evaluate`() as local_dict argument or to eval() as locals argument.

  • modules (Mapping[str, ModuleType] | None) – a dictionary of additional modules used by the expression. If this is not None then eval`is used and the expression can depend on any modules from this dictionary in addition to awkward and numpy. These are passed to :func:`eval() as globals argument.

  • with_units (bool) – attach units to the columns as in LGDO.view_as().

  • library (str | None) – library to convert the columns to with LGDO.view_as(), supported libraries are np, ak or lgdo (pass in directly the unconverted LGDO objects).

Return type:

LGDO

Examples

>>> import lgdo
>>> tbl = lgdo.Table(
...   col_dict={
...     "a": lgdo.Array([1, 2, 3]),
...     "b": lgdo.VectorOfVectors([[5], [6, 7], [8, 9, 0]]),
...   }
... )
>>> print(tbl.eval("a + b"))
[[6],
 [8 9],
 [11 12  3],
]
>>> print(tbl.eval("np.sum(a) + ak.sum(b)"))
41
flatten(_prefix='')

Flatten the table, if nested.

Returns a new Table (that references, not copies, the existing columns) with columns in nested tables being moved to the first level (and renamed appropriately).

Examples

>>> repr(tbl)
"Table(dict={'a': Array([1 2 3], attrs={'datatype': 'array<1>{real}'}), 'tbl': Table(dict={'b': Array([4 5 6], attrs={'datatype': 'array<1>{real}'}), 'tbl1': Table(dict={'z': Array([9 9 9], attrs={'datatype': 'array<1>{real}'})}, attrs={'datatype': 'table{z}'})}, attrs={'datatype': 'table{b,tbl1}'})}, attrs={'datatype': 'table{a,tbl}'})"
>>> tbl.flatten().keys()
dict_keys(['a', 'tbl__b', 'tbl__tbl1__z'])
Return type:

Table

get_capacity()

Return mapping from field name to capacity

Return type:

dict[str, int]

get_dataframe(cols=None, copy=False, prefix='')

Get a pandas.DataFrame from the data in the table.

Warning

This method is deprecated. Use view_as() to view the table as a Pandas dataframe.

Notes

The requested data must be array-like, with the nda attribute.

Parameters:
  • cols (list[str] | None) – a list of column names specifying the subset of the table’s columns to be added to the dataframe.

  • copy (bool) – When True, the dataframe allocates new memory and copies data into it. Otherwise, the raw nda’s from the table are used directly.

  • prefix (str) – The prefix to be added to the column names. Used when recursively getting the dataframe of a Table inside this Table

Return type:

DataFrame

insert(i, vals)

Insert new row(s) into table

Parameters:
  • i (int) – row at which to insert values

  • vals (Table | Mapping[str, Any]) – values to add. Require same keys as table

join(other_table, cols=None, keep_mine=False, prefix='', suffix='', do_warn=True)

Add the columns of another table to this table.

Notes

Following the join, both tables have access to other_table’s fields (but other_table doesn’t have access to this table’s fields). No memory is allocated in this process. other_table can go out of scope and this table will retain access to the joined data.

Parameters:
  • other_table (Table) – the table whose columns are to be joined into this table.

  • cols (list[str] | None) – a list of names of columns from other_table to be joined into this table.

  • keep_mine (bool) – if there is a column name conflict, keep this Tables col if True, or joined Table’s column if False (default).

  • prefix (str) – prepend to joined column names; can be used to avoid name conflicts.

  • suffix (str) – append to joined column names; can be used to avoid name conflicts.

  • do_warn (bool) – set to False to turn off warnings associated with mismatched loc parameter or add_column() warnings.

remove_column(name, delete=False)

Alias for Struct.remove_field() using table terminology ‘column’.

replace(i, vals)

replace item at position i with val in LGDOCollection

reserve_capacity(capacity)

Set size (number of rows) of internal memory buffer

Parameters:

capacity (int | Mapping[str | int]) – new capacities for fields in table. If int, set all capacities to value; if Mapping, set capacity field-by-field.

resize(new_size=None, do_warn=False, trim=False)

Resize all columns of the table

Parameters:
  • new_size (int | None) – new size of table. If None use size of first field found

  • do_warn (bool) – emit a warning if contents for any field must be resized. This is intended for use with new_size = None

  • trim (bool) – call trim_capacity() after resizing to conserve memory

trim_capacity()

Set capacity for each column to be minimum needed to support size

Return type:

int

view_as(library, with_units=False, cols=None, prefix='')

View the Table data as a third-party format data structure.

This is typically a zero-copy or nearly zero-copy operation.

Supported third-party formats are:

Notes

Conversion to Awkward array only works when the key is a string.

Parameters:
  • library (str) – format of the returned data view.

  • with_units (bool) – forward physical units to the output data.

  • cols (list[str] | None) – a list of column names specifying the subset of the table’s columns to be added to the data view structure.

  • prefix (str) – The prefix to be added to the column names. Used when recursively getting the dataframe of a Table inside this Table.

Return type:

pd.DataFrame | np.NDArray | ak.Array

See also

LGDO.view_as

lgdo.types.table._ak_to_lgdo_or_col_dict(array)

lgdo.types.vectorofvectors module

Implements a LEGEND Data Object representing a variable-length array of variable-length arrays and corresponding utilities.

class lgdo.types.vectorofvectors.VectorOfVectors(*_args, **_kwargs)

Bases: LGDOCollection

A n-dimensional variable-length 1D array of variable-length 1D arrays.

If the vector is 2-dimensional, the internal representation is as two NumPy arrays, one to store the flattened data contiguosly (flattened_data) and one to store the cumulative sum of lengths of each vector (cumulative_length). When the dimension is more than 2, flattened_data is a VectorOfVectors itself.

Examples

>>> from lgdo import VectorOfVectors
>>> data = VectorOfVectors(
...   [[[1, 2], [3, 4, 5]], [[2], [4, 8, 9, 7]], [[5, 3, 1]]],
...   attrs={"units": "m"}
... )
>>> print(data)
[[[1, 2], [3, 4, 5]],
 [[2], [4, 8, 9, 7]],
 [[5, 3, 1]]
] with attrs={'units': 'm'}
>>> data.view_as("ak")
<Array [[[1, 2], [3, 4, 5]], ..., [[5, ..., 1]]] type='3 * var * var * int64'>

Note

Many class methods are currently implemented only for 2D vectors and will raise an exception on higher dimensional data.

Parameters:
  • data (ArrayLike | None) – Any array-like structure accepted by the ak.Array constructor, with the exception that elements cannot be of type OptionType, UnionType or RecordType. Takes priority over flattened_data and cumulative_length. The serialization of the ak.Array is performed through ak.to_buffers(). Since the latter returns non-data-owning NumPy arrays, which would prevent later modifications like resizing, a copy is performed.

  • flattened_data (ArrayLike | None) – if not None, used as the internal array for self.flattened_data. Otherwise, an internal flattened_data is allocated based on cumulative_length (or shape_guess) and dtype.

  • cumulative_length (ArrayLike | VectorOfVectors | None) – if not None, used as the internal array for self.cumulative_length. Should be dtype numpy.uint32. If cumulative_length is None, an internal cumulative_length is allocated based on the first element of shape_guess.

  • offsets (ArrayLike | None) – if not None, used directly as the internal offsets array (PyArrow-compatible format with leading 0). Takes priority over cumulative_length. Mutually exclusive with cumulative_length.

  • shape_guess (Sequence[int, ...] | None) – a NumPy-format shape specification, required if either of flattened_data or cumulative_length are not supplied. The first element should not be a guess and sets the number of vectors to be stored. The second element is a guess or approximation of the typical length of a stored vector, used to set the initial length of flattened_data if it was not supplied.

  • dtype (DTypeLike | None) – sets the type of data stored in flattened_data. Required if flattened_data and array are None.

  • fill_val (int | float | None) – fill all of self.flattened_data with this value.

  • attrs (Mapping[str, Any] | None) – a set of user attributes to be carried along with this LGDO.

static _ak_is_jagged(type_)

Returns True if ak.Array is jagged at all axes.

This assures that ak.to_buffers() returns the expected data structures.

Return type:

bool

static _ak_is_valid(type_)

Returns True if ak.Array contains only elements we can serialize to LH5.

Return type:

bool

_set_vector_unsafe(i, vec, lens=None)

Insert vector vec at position i.

Assumes that j = self.cumulative_length[i-1] is the index (in self.flattened_data) of the end of the (i-1)th vector and copies vec in self.flattened_data[j:sum(lens)]. Finally updates self.cumulative_length[i] with the new flattened data array length.

Vectors stored after index i are removed and the VectorOfVectors is resized. This method is typically used for fast sequential fill of a pre-allocated vector of vectors.

If i`vec` is 1D array and lens is None, set using full array. If vec is 2D, require lens to be included, and fill each array only up to lengths in lens.

Danger

This method resizes the array, removes subsequent vectors, and can lead to undefined behavior or vector-view invalidation if used improperly. Use it only if you know what you are doing.

See also

append, replace, insert

append(new)

Append a 1D vector new at the end.

Examples

>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]])
>>> vov.append([8, 9])
>>> print(vov)
[[1 2 3],
 [4 5],
 [8 9],
]
property cumulative_length: Array

] for backwards compatibility.

Type:

Return an Array view of offsets[1

datatype_name()

The name for this LGDO’s datatype attribute.

Return type:

str

property dtype: dtype
form_datatype()

Return this LGDO’s datatype attribute string.

Return type:

str

get_capacity()

Get tuple containing capacity of each dimension. First dimension is cumulative length array. Last dimension is flattened data.

Return type:

tuple[int]

insert(i, new)

Insert a vector at index i.

self.flattened_data (and therefore self.cumulative_length) is resized in order to accommodate the new element.

Examples

>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]])
>>> vov.insert(1, [8, 9])
>>> print(vov)
[[1 2 3],
 [8 9],
 [4 5],
]
property ndim
replace(i, new)

Replace the vector at index i with new.

self.flattened_data (and therefore self.cumulative_length) is resized, if the length of new is different from the vector currently at index i.

Examples

>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]])
>>> vov.replace(0, [8, 9])
>>> print(vov)
[[8 9],
 [4 5],
]
reserve_capacity(cap_cl, *cap_args)

Set capacity of internal data arrays. Expect number of args to equal self.n_dim. First arg is capacity of cumulative length array. If self.n_dim is 2, second argument is capacity of flattened data, otherwise arguments are fed recursively to remaining dimensions.

resize(new_size, trim=False)

Resize vector along the first axis.

self.flattened_data is resized only if new_size is smaller than the current vector length.

If new_size is larger than the current vector length, self.cumulative_length is padded with its last element. This corresponds to appending empty vectors.

If trim is True, resize capacity to match new size

Examples

>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]])
>>> vov.resize(3)
>>> print(vov)
[[1 2 3],
 [4 5],
 [],
]
>>> vov = VectorOfVectors([[1, 2], [3], [4, 5]])
>>> vov.resize(2)
>>> print(vov)
[[1 2],
 [3],
]
to_aoesa(max_len=None, fill_val=nan, preserve_dtype=False)

Convert to ArrayOfEqualSizedArrays.

Note

The dtype of the original vector is typically not strictly preserved. The output dtype will be either np.float64 or np.int64. If you want to use the same exact dtype, set preserve_dtype to True.

Parameters:
  • max_len (int | None) – the length of the returned array along its second dimension. Longer vectors will be truncated, shorter will be padded with fill_val. If None, the length will be equal to the length of the longest vector.

  • fill_val (bool | int | float) – value used to pad shorter vectors up to max_len. The dtype of the output array will be such that both fill_val and the vector values can be represented in the same data structure.

  • preserve_dtype (bool) – whether the output array should have exactly the same dtype as the original vector of vectors. The type fill_val must be a compatible one.

Return type:

ArrayOfEqualSizedArrays

trim_capacity()

Set capacity for all dimensions to minimum needed to hold data

view_as(library, with_units=False, fill_val=nan, preserve_dtype=False)

View the vector data as a third-party format data structure.

This is typically a zero-copy or nearly zero-copy operation.

Supported third-party formats are:

  • pd: returns a pandas.Series (supported through the awkward-pandas package)

  • np: returns a numpy.ndarray, padded with zeros to make it rectangular. This implies memory re-allocation.

  • ak: returns an ak.Array. self.cumulative_length is currently re-allocated for technical reasons.

Notes

Awkward array views partially involve memory re-allocation (the cumulative_lengths), while NumPy “exploded” views clearly imply a full copy.

Parameters:
  • library (str) – format of the returned data view.

  • with_units (bool) – forward physical units to the output data.

  • fill_val (bool | int | float) – forwarded to to_aoesa(), if library is np.

  • preserve_dtype (bool) – forwarded to to_aoesa(), if library is np.

Return type:

pd.DataFrame | np.NDArray | ak.Array

See also

LGDO.view_as

class lgdo.types.vectorofvectors._OffsetArrayView(*_args, **_kwargs)

Bases: Array

Array view into offsets[1:] that delegates mutations to the parent offsets.

Parameters:
  • nda – An numpy.ndarray or ak.Array to be used for this object’s internal array. If the Awkward array carries a units parameter, it will forwarded as LGDO attribute.

  • shape – A numpy-format shape specification for shape of the internal ndarray. Required if nda is None, otherwise unused.

  • dtype – Specifies the type of the data in the array. Required if nda is None, otherwise unused.

  • fill_val – If None, memory is allocated without initialization. Otherwise, the array is allocated with all elements set to the corresponding fill value. If nda is not None, this parameter is ignored.

  • attrs – A set of user attributes to be carried along with this LGDO. These attributes have always precedence over all the others (e.g. those carried by nda).

Warning

This constructor has partial units support. It supports fishing units from Awkward Array parameters but not (yet) from e.g. NumPy+Pint arrays. In any case, the user can always attach units later by modifying the dictionary held by attrs.

Note

The array is used directly, not copied. If not supplied, internal memory is newly allocated based on the shape and dtype arguments.

_refresh_view()
get_capacity()

Get capacity (i.e. max size before memory must be re-allocated)

Return type:

int

insert(i, value)

Insert value(s) into row i (with copy). If collection of values are provided insert all values. Array will be resized and subsequent values will be shifted.

reserve_capacity(capacity)

Set size (number of rows) of internal memory buffer

resize(new_size, trim=False)

Set size of Array in rows. Only change capacity if it must be increased to accommodate new rows; in this case double capacity. If trim is True, capacity will be set to match size. If new_size is an int, do not change size of inner dimensions.

If new_size is a collection, internal memory will be re-allocated, so this should be done only rarely!

Parameters:
  • new_size (int) – number of rows after resize. If provided as a collection, also resize inner dimensions

  • trim (bool) – after resizing, change capacity to match size

trim_capacity()

Set capacity to be minimum needed to support Array size

lgdo.types.vectorofvectors._to_aoesa(flattened_array, cumulative_length, nda)

lgdo.types.vovutils module

VectorOfVectors utilities. Note: importing this module takes a long time, so it should be lazily imported inside of a function call rather than with a full module

lgdo.types.vovutils._nb_build_cl(sorted_array_in, cumulative_length_out)

numbified inner loop for build_cl

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

lgdo.types.vovutils._nb_explode(cumulative_length, array_in, array_out)

Numbified inner loop for explode().

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

lgdo.types.vovutils._nb_explode_cl(cumulative_length, array_out)

numbified inner loop for explode_cl

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

lgdo.types.vovutils.build_cl(sorted_array_in, cumulative_length_out=None)

Build a cumulative length array from an array of sorted data.

Examples

>>> build_cl(np.array([3, 3, 3, 4])
array([3., 4.])

For a sorted_array_in of indices, this is the inverse of explode_cl(), in the sense that doing build_cl(explode_cl(cumulative_length)) would recover the original cumulative_length.

Parameters:
  • sorted_array_in (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – array of data already sorted; each N matching contiguous entries will be converted into a new row of cumulative_length_out.

  • cumulative_length_out (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None) – a pre-allocated array for the output cumulative_length. It will always have length <= sorted_array_in, so giving them the same length is safe if there is not a better guess.

Returns:

cumulative_length_out – the output cumulative length array. If the user provides a cumulative_length_out that is too long, this return value is sliced to contain only the used portion of the allocated memory.

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

lgdo.types.vovutils.explode(cumulative_length, array_in, array_out=None)

Explode a data array using a cumulative_length array.

This is identical to explode_cl(), except array_in gets exploded instead of cumulative_length.

Examples

>>> explode(np.array([2, 3]), np.array([3, 4]))
array([3., 3., 4.])
Parameters:
  • cumulative_length (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – the cumulative length array to use for exploding.

  • array_in (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – the data to be exploded. Must have same length as cumulative_length.

  • array_out (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None) – a pre-allocated array to hold the exploded data. The length should be equal to cumulative_length[-1].

Returns:

array_out – the exploded cumulative length array.

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

lgdo.types.vovutils.explode_arrays(cumulative_length, arrays, arrays_out=None)

Explode a set of arrays using a cumulative_length array.

Parameters:
  • cumulative_length (Array) – the cumulative length array to use for exploding.

  • arrays (Sequence[ndarray[tuple[Any, ...], dtype[_ScalarT]]]) – the data arrays to be exploded. Each array must have same length as cumulative_length.

  • arrays_out (Sequence[ndarray[tuple[Any, ...], dtype[_ScalarT]]] | None) – a list of pre-allocated arrays to hold the exploded data. The length of the list should be equal to the length of arrays, and each entry in arrays_out should have length cumulative_length[-1]. If not provided, output arrays are allocated for the user.

Returns:

arrays_out – the list of exploded cumulative length arrays.

Return type:

list

lgdo.types.vovutils.explode_cl(cumulative_length, array_out=None)

Explode a cumulative_length array.

Examples

>>> explode_cl(np.array([2, 3]))
array([0., 0., 1.])

This is the inverse of build_cl(), in the sense that doing build_cl(explode_cl(cumulative_length)) would recover the original cumulative_length.

Parameters:
  • cumulative_length (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – the cumulative length array to be exploded.

  • array_out (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None) – a pre-allocated array to hold the exploded cumulative length array. The length should be equal to cumulative_length[-1].

Returns:

array_out – the exploded cumulative length array.

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

lgdo.types.waveformtable module

Implements a LEGEND Data Object representing a special Table to store blocks of one-dimensional time-series data.

class lgdo.types.waveformtable.WaveformTable(*args, **kwargs)

Bases: Table

An LGDO for storing blocks of (1D) time-series data.

A WaveformTable is an LGDO Table with the 3 columns t0, dt, and values:

  • t0[i] is a time offset (relative to a user-defined global reference) for the sample in values[i][0]. Implemented as an LGDO Array with optional attribute units.

  • dt[i] is the sampling period for the waveform at values[i]. Implemented as an LGDO Array with optional attribute units.

  • values[i] is the i’th waveform in the table. Internally, the waveforms values may be either an LGDO ArrayOfEqualSizedArrays<1,1>, an LGDO VectorOfVectors or VectorOfEncodedVectors that supports waveforms of unequal length. Can optionally be given a units attribute.

Note

On-disk and in-memory versions could be different e.g. if a compression routine is used.

Parameters:
  • size (int | None) – sets the number of rows in the table. If None, the size will be determined from the first among t0, dt, or values to return a valid length. If not None, t0, dt, and values will be resized as necessary to match size. If size is None and t0, dt, and values are all non-array-like, a default size of 1024 is used.

  • t0 (float | Array | np.ndarray) – \(t_0\) values to be used (or broadcast) to the t0 column.

  • t0_units (str | None) – units for the \(t_0\) values. If not None and t0 is an LGDO Array, overrides what’s in t0.

  • dt (float | Array | np.ndarray) – \(\delta t\) values (sampling period) to be used (or broadcasted) to the t0 column.

  • dt_units (str | None) – units for the dt values. If not None and dt is an LGDO Array, overrides what’s in dt.

  • values (ArrayOfEqualSizedArrays | VectorOfVectors | np.ndarray) – The waveform data to be stored in the table. If None a block of data is prepared based on the wf_len and dtype arguments.

  • values_units (str | None) – units for the waveform values. If not None and values is an LGDO Array, overrides what’s in values.

  • wf_len (int | None) – The length of the waveforms in each entry of a table. If None (the default), unequal lengths are assumed and VectorOfVectors is used for the values column. Ignored if values is a 2D ndarray, in which case values.shape[1] is used.

  • dtype (np.dtype) – The NumPy numpy.dtype of the waveform data. If values is not None, this argument is ignored. If both values and dtype are None, numpy.float64 is used.

  • attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO.

property dt: Array
property dt_units: str
resize_wf_len(new_len)

Alias for wf_len.setter, for when we want to make it clear in the code that memory is being reallocated.

property t0: Array
property t0_units: str
WaveformTable.values -> an object providing a view on D's values
property values_units: str
view_as(library, with_units=False, cols=None, prefix='')

View the waveform data as a third-party format data structure.

See also

LGDO.view_as

Return type:

pd.DataFrame | np.NDArray | ak.Array

property wf_len: int