lgdo.types package¶

LEGEND Data Objects (LGDO) types.

Submodules¶

lgdo.types.array module¶

Implements a LEGEND Data Object representing an n-dimensional array and corresponding utilities.

class lgdo.types.array.Array(*_args, **_kwargs)¶

Bases: LGDOCollection

Holds an numpy.ndarray and attributes.

Array (and the other various array types) holds an nda instead of deriving from numpy.ndarray for the following reasons:

It keeps management of the nda totally under the control of the user. The user can point it to another object’s buffer, grab the nda and toss the Array, etc.
It allows the management code to send just the nda’s the central routines for data manpulation. Keeping LGDO’s out of that code allows for more standard, reusable, and (we expect) performant Python.
It allows the first axis of the nda to be treated as “special” for storage in Tables.

Parameters:

nda (np.ndarray) – An numpy.ndarray to be used for this object’s internal array. Note: the array is used directly, not copied. If not supplied, internal memory is newly allocated based on the shape and dtype arguments.
shape (tuple[int, ...]) – A numpy-format shape specification for shape of the internal ndarray. Required if nda is None, otherwise unused.
dtype (np.dtype) – Specifies the type of the data in the array. Required if nda is None, otherwise unused.
fill_val (float | int | None) – If None, memory is allocated without initialization. Otherwise, the array is allocated with all elements set to the corresponding fill value. If nda is not None, this parameter is ignored.
attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO.

append(value)¶

Append value to end of array (with copy)

datatype_name()¶

The name for this LGDO’s datatype attribute.

Return type:: str

property dtype¶

form_datatype()¶

Return this LGDO’s datatype attribute string.

Return type:: str

get_capacity()¶

Get capacity (i.e. max size before memory must be re-allocated)

Return type:: int

insert(i, value)¶

Insert value into row i (with copy)

property nda¶

replace(i, value)¶

Replace value at row i

reserve_capacity(capacity)¶

Set size (number of rows) of internal memory buffer

resize(new_size, trim=False)¶

Set size of Array in rows. Only change capacity if it must be increased to accommodate new rows; in this case double capacity. If trim is True, capacity will be set to match size. If new_size is an int, do not change size of inner dimensions.

If new_size is a collection, internal memory will be re-allocated, so this should be done only rarely!

property shape¶

trim_capacity()¶

Set capacity to be minimum needed to support Array size

view_as(library, with_units=False)¶

View the Array data as a third-party format data structure.

This is a zero-copy operation. Supported third-party formats are:

pd: returns a pandas.Series
np: returns the internal nda attribute (numpy.ndarray)
ak: returns an ak.Array initialized with self.nda

Parameters:

library (str) – format of the returned data view.
with_units (bool) – forward physical units to the output data.

Return type:

pd.DataFrame | np.NDArray | ak.Array

See also

LGDO.view_as

lgdo.types.arrayofequalsizedarrays module¶

Implements a LEGEND Data Object representing an array of equal-sized arrays and corresponding utilities.

class lgdo.types.arrayofequalsizedarrays.ArrayOfEqualSizedArrays(*_args, **_kwargs)¶

Bases: Array

An array of equal-sized arrays.

Arrays of equal size within a file but could be different from application to application. Canonical example: array of same-length waveforms.

Parameters:

dims (tuple[int, ...] | None) – specifies the dimensions required for building the ArrayOfEqualSizedArrays’ datatype attribute.
nda (np.ndarray) – An numpy.ndarray to be used for this object’s internal array. Note: the array is used directly, not copied. If not supplied, internal memory is newly allocated based on the shape and dtype arguments.
shape (tuple[int, ...]) – A NumPy-format shape specification for shape of the internal array. Required if nda is None, otherwise unused.
dtype (np.dtype) – Specifies the type of the data in the array. Required if nda is None, otherwise unused.
fill_val (int | float | None) – If None, memory is allocated without initialization. Otherwise, the array is allocated with all elements set to the corresponding fill value. If nda is not None, this parameter is ignored.
attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO.

Notes

If shape is not “1D array of arrays of shape given by axes 1-N” (of nda) then specify the dimensionality split in the constructor.

See also

Array

datatype_name()¶

The name for this LGDO’s datatype attribute.

Return type:: str

form_datatype()¶

Return this LGDO’s datatype attribute string.

Return type:: str

to_vov(cumulative_length=None)¶

Convert (and eventually resize) to vectorofvectors.VectorOfVectors.

Parameters:: cumulative_length (ndarray) – cumulative length array of the output vector of vectors. Each vector in the output is filled with values found in the ArrayOfEqualSizedArrays, starting from the first index. if None, use all of the original 2D array and make vectors of equal size.
Return type:: VectorOfVectors

view_as(library, with_units=False)¶

View the array as a third-party format data structure.

See also

LGDO.view_as

Return type:: pd.DataFrame | np.NDArray | ak.Array

lgdo.types.encoded module¶

class lgdo.types.encoded.ArrayOfEncodedEqualSizedArrays(*_args, **_kwargs)¶

Bases: LGDOCollection

An array of encoded arrays with equal decoded size.

Used to represent an encoded ArrayOfEqualSizedArrays. In addition to an internal VectorOfVectors self.encoded_data storing the encoded data, the size of the decoded arrays is stored in a Scalar self.encoded_size.

See also

ArrayOfEqualSizedArrays

Parameters:

encoded_data (VectorOfVectors) – the vector of vectors holding the encoded data.
decoded_size (Scalar | int) – the length of the decoded arrays.
attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO. Should include information about the codec used to encode the data.

append(value)¶

Append a 1D encoded array at the end.

See also

VectorOfVectors.append

datatype_name()¶

The name for this LGDO’s datatype attribute.

Return type:: str

form_datatype()¶

Return this LGDO’s datatype attribute string.

Return type:: str

get_capacity()¶

get reserved capacity of internal memory buffers in rows

Return type:: tuple

insert(i, value)¶

Insert an encoded array at index i.

See also

VectorOfVectors.insert

replace(i, value)¶

Replace the encoded array at index i with a new one.

See also

VectorOfVectors.replace

reserve_capacity(*capacity)¶

Reserve capacity (in rows) for later use. Internal memory buffers will have enough entries to store this many rows.

resize(new_size, trim=False)¶

Resize array along the first axis.

See also

VectorOfVectors.resize

trim_capacity()¶

set capacity to only what is required to store current contents of LGDOCollection

view_as(library, with_units=False)¶

View the encoded data as a third-party format data structure.

This is nearly a zero-copy operation.

Supported third-party formats are:

pd: returns a pandas.DataFrame
ak: returns an ak.Array (record type)

Note

In the view, decoded_size is expanded into an array.

Parameters:

library (str) – format of the returned data view.
with_units (bool) – forward physical units to the output data.

Return type:

pd.DataFrame | np.NDArray | ak.Array

See also

LGDO.view_as

class lgdo.types.encoded.VectorOfEncodedVectors(*_args, **_kwargs)¶

Bases: LGDOCollection

An array of variable-length encoded arrays.

Used to represent an encoded VectorOfVectors. In addition to an internal VectorOfVectors self.encoded_data storing the encoded data, a 1D Array in self.encoded_size holds the original sizes of the encoded vectors.

See also

VectorOfVectors

Parameters:

encoded_data (VectorOfVectors) – the vector of encoded vectors.
decoded_size (Array) – an array holding the original length of each encoded vector in encoded_data.
attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO. Should include information about the codec used to encode the data.

datatype_name()¶

The name for this LGDO’s datatype attribute.

Return type:: str

form_datatype()¶

Return this LGDO’s datatype attribute string.

Return type:: str

get_capacity()¶

get reserved capacity of internal memory buffers in rows

Return type:: tuple

insert(i, value)¶

Insert an encoded vector at index i.

Parameters:

i (int) – the new vector will be inserted before this index.
value (tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], int]) – a tuple holding the encoded array and its decoded size.

See also

VectorOfVectors.insert

replace(i, value)¶

Replace the encoded vector (and decoded size) at index i with a new one.

Parameters:

i (int) – index of the vector to be replaced.
value (tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], int]) – a tuple holding the encoded array and its decoded size.

See also

VectorOfVectors.replace

reserve_capacity(*capacity)¶

Reserve capacity (in rows) for later use. Internal memory buffers will have enough entries to store this many rows.

resize(new_size)¶

Resize vector along the first axis.

See also

VectorOfVectors.resize

trim_capacity()¶

set capacity to only what is required to store current contents of LGDOCollection

view_as(library, with_units=False)¶

View the encoded data as a third-party format data structure.

This is a zero-copy or nearly zero-copy operation.

Supported third-party formats are:

pd: returns a pandas.DataFrame
ak: returns an ak.Array (record type)

Parameters:

library (str) – format of the returned data view.
with_units (bool) – forward physical units to the output data.

Return type:

pd.DataFrame | np.NDArray | ak.Array

See also

LGDO.view_as

lgdo.types.fixedsizearray module¶

Implements a LEGEND Data Object representing an n-dimensional array of fixed size and corresponding utilities.

class lgdo.types.fixedsizearray.FixedSizeArray(*_args, **_kwargs)¶

Bases: Array

An array of fixed-size arrays.

Arrays with guaranteed shape along axes > 0: for example, an array of vectors will always length 3 on axis 1, and it will never change from application to application. This data type is used for optimized memory handling on some platforms. We are not that sophisticated so we are just storing this identification for LGDO validity, i.e. for now this class is just an alias for Array, but keeps track of the datatype name.

See also

Array

datatype_name()¶

The name for this LGDO’s datatype attribute.

Return type:: str

view_as(library, with_units=False)¶

View the array as a third-party format data structure.

See also

LGDO.view_as

lgdo.types.histogram module¶

class lgdo.types.histogram.Histogram(*_args, **_kwargs)¶

Bases: Struct

A special struct to contain histogrammed data.

Parameters:

weights (hist.Hist | NDArray | Array) – An numpy.ndarray to be used for this object’s internal array, or a hist.Hist object, whose data view is used for this object’s internal array. Note: the array/histogram view is used directly, not copied
binning (None | Iterable[Histogram.Axis] | Iterable[NDArray] | Iterable[tuple[float, float, float]]) –
- has to by None if a hist.Hist has been passed as weights
- can be a list of pre-initialized Histogram.Axis
- can be a list of tuples, each representing a range, (first, last, step)
- can be a list of numpy arrays, as returned by numpy.histogramdd().
isdensity (bool) – If True, all bin contents represent a density (amount per volume), and not an absolute amount.
binedge_attrs (dict[str, Any] | None) – attributes that will be added to the all binedges of all axes. This does not work if Histogram.Axis instances are directly passed as binning.
attrs (dict[str, Any] | None) – a set of user attributes to be carried along with this LGDO.
flow (bool) –
If False, discard counts in over-/underflow bins of the passed hist.Hist instance. If True, this data will also be discarded, but a warning is emitted.

Note

Histogram does not support storing counts in overflow or underflow bins. This parameter just controls, whether a warning will be emitted.

class Axis(*_args, **_kwargs)¶

Bases: Struct

A special struct to group axis parameters for use in a Histogram.

Depending on the parameters, an axis either can have

a binning described by a range object, if first, last and step are passed, or
a variable binning described by the edges array.

Parameters:

edges (NDArray | Array | None) – an array of edges that describe the binning of this axis.
first (float | None) – left edge of the leftmost bin
last (float | None) – right edge of the rightmost bin
step (float | None) – step size (width of each bin)
closedleft (bool) – if True, the bin intervals are left-closed \([a,b)\); if False, intervals are right-closed \((a,b]\).
binedge_attrs (dict[str, Any] | None) – attributes that will be added to the binedges LGDO that is part of the axis struct.

property closedleft: bool¶

property edges: ndarray[tuple[int, ...], dtype[_ScalarType_co]]¶: Return all binedges, both for variable and range binning.

property first: float¶

classmethod from_edges(edges, binedge_attrs=None)¶

Create a new axis with variable binning described by edges.

Return type:: Axis

classmethod from_range_edges(edges, binedge_attrs=None)¶

Create a new axis from the binning described by edges, but try to convert it to a evenly-spaced range object first.

Warning

This function might return a wrong binning, especially in the case of very small magnitudes of the spacing. See the documentation of numpy.isclose() for details. Use this function only with caution, if you know the binning’s order of magniutude.

Return type:: Axis

get_binedgeattrs(datatype=False)¶

Return a copy of the LGDO attributes dictionary of the binedges

Parameters:: datatype (bool) – if False, remove datatype attribute from the output dictionary.
Return type:: dict

property is_range: bool¶

property last: float¶

property nbins: int¶: Return the number of bins, both for variable and range binning.

property step: float¶

add_field(name, obj)¶

Error

Not applicable: A histogram cannot be used as a struct

property binning: tuple[Axis, ...]¶

fill(data, w=None, keys=None)¶

Fill histogram by incrementing bins with data points weighted by w

Parameters:

data – a ndarray with inner dimension equal to number of axes, or a list of equal-length 1d-arrays containing data for each axis, or a Mapping to 1d-arrays containing data for each axis (requires keys), or a Pandas dataframe (optionally takes a list of keys)
w (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – weight to use for incrementing data points. If None, use 1 for all
keys (Sequence[str]) – list of keys to use if data is a pandas ‘’DataFrame’’ or ‘’Mapping’’

property isdensity: bool¶

remove_field(name, delete=False)¶

Error

Not applicable: A histogram cannot be used as a struct

view_as(library)¶

View the histogram data as a third-party format data structure.

This is typically a zero-copy or nearly zero-copy operation.

Supported third-party formats are:

np: returns a tuple of binning and an np.ndarray, similar to the return value of numpy.histogramdd().
hist: returns an hist.Hist that holds a copy of this histogram’s data.

Warning

Viewing as hist will perform a copy of the stored histogram data.

Parameters:: library (str) – format of the returned data view.
Return type:: tuple[ndarray[tuple[int, …], dtype[_ScalarType_co]]] | Hist

See also

LGDO.view_as

property weights: Array¶

lgdo.types.lgdo module¶

class lgdo.types.lgdo.LGDO(*_args, **_kwargs)¶

Bases: ABC

Abstract base class representing a LEGEND Data Object (LGDO).

abstract datatype_name()¶

The name for this LGDO’s datatype attribute.

Return type:: str

abstract form_datatype()¶

Return this LGDO’s datatype attribute string.

Return type:: str

getattrs(datatype=False)¶

Return a copy of the LGDO attributes dictionary.

Parameters:: datatype (bool) – if False, remove datatype attribute from the output dictionary.
Return type:: dict

abstract view_as(library, with_units=False)¶

View the LGDO data object as a third-party format data structure.

This is typically a zero-copy or nearly zero-copy operation unless explicitly stated in the concrete LGDO documentation. The view can be turned into a copy explicitly by the user with the appropriate methods. If requested by the user, the output format supports it and the LGDO carries a units attribute, physical units are attached to the view through the pint package.

Typical supported third-party libraries are:

pd: pandas
np: numpy
ak: awkward

Note

Awkward does not support attaching units through Pint, at the moment.

but the actual supported formats may vary depending on the concrete LGDO class.

Parameters:

library (str) – format of the returned data view.
with_units (bool) – forward physical units to the output data.

Return type:

pd.DataFrame | np.NDArray | ak.Array

class lgdo.types.lgdo.LGDOCollection(*_args, **_kwargs)¶

Bases: LGDO

Abstract base class representing a LEGEND Collection Object (LGDO). This defines the interface for classes used as table columns.

append(val)¶

append val to end of LGDOCollection

clear(trim=False)¶

set size of LGDOCollection to zero

abstract get_capacity()¶

get reserved capacity of internal memory buffers in rows

Return type:: int

abstract insert(i, val)¶

insert val into LGDOCollection at position i

abstract replace(i, val)¶

replace item at position i with val in LGDOCollection

abstract reserve_capacity(capacity)¶

Reserve capacity (in rows) for later use. Internal memory buffers will have enough entries to store this many rows.

abstract resize(new_size, trim=False)¶

Return this LGDO’s datatype attribute string.

abstract trim_capacity()¶

set capacity to only what is required to store current contents of LGDOCollection

lgdo.types.scalar module¶

Implements a LEGEND Data Object representing a scalar and corresponding utilities.

class lgdo.types.scalar.Scalar(*_args, **_kwargs)¶

Bases: LGDO

Holds just a scalar value and some attributes (datatype, units, …).

Parameters:

value (int | float | str) – the value for this scalar.
attrs (dict[str, Any] | None) – a set of user attributes to be carried along with this LGDO.

datatype_name()¶

The name for this LGDO’s datatype attribute.

Return type:: str

form_datatype()¶

Return this LGDO’s datatype attribute string.

Return type:: str

view_as(with_units=False)¶

Dummy function, returns the scalar value itself.

See also

LGDO.view_as

lgdo.types.struct module¶

Implements a LEGEND Data Object representing a struct and corresponding utilities.

class lgdo.types.struct.Struct(*_args, **_kwargs)¶

Bases: LGDO, dict

A dictionary of LGDO’s with an optional set of attributes.

After instantiation, add fields using add_field() to keep the datatype updated, or call update_datatype() after adding.

Parameters:

obj_dict (Mapping[str, LGDO] | None) – instantiate this Struct using the supplied named LGDO’s. Note: no copy is performed, the objects are used directly.
attrs (Mapping[str, Any] | None) – a set of user attributes to be carried along with this LGDO.

add_field(name, obj)¶

Add a field to the table.

datatype_name()¶

The name for this LGDO’s datatype attribute.

Return type:: str

form_datatype()¶

Return this LGDO’s datatype attribute string.

Return type:: str

remove_field(name, delete=False)¶

Remove a field from the table.

Parameters:

name (str | int) – name of the field to be removed.
delete (bool) – if True, delete the field object by calling The del statement.

update_datatype()¶

view_as()¶

View the Struct data as a third-party format data structure.

Error

Not implemented. Since Struct’s fields can have different lengths, converting to a NumPy, Pandas or Awkward is generally not possible. Call LGDO.view_as() on the fields instead.

See also

LGDO.view_as

lgdo.types.struct._get_struct_fields(expr)¶

Return type:: list[str]

lgdo.types.struct._is_struct_datatype(dt_name, expr)¶

lgdo.types.struct._sort_datatype_fields(expr)¶

lgdo.types.struct._struct_datatype_equal(dt_name, dt1, dt2)¶

lgdo.types.table module¶

Implements a LEGEND Data Object representing a special struct of arrays of equal length and corresponding utilities.

class lgdo.types.table.Table(*args, **kwargs)¶

Bases: Struct, LGDOCollection

A special struct of arrays or subtable columns of equal length.

Note

If you write to a table and don’t fill it up to its total size, be sure to resize it before passing to data processing functions, as they will call __len__() to access valid data, which returns the size attribute.

Parameters:

size – sets the number of rows in the table. Arrays in col_dict will be resized to match size if both are not ``None`. If size is left as None, the number of table rows is determined from the length of the first array in col_dict. If neither is provided, a default length of 1024 is used.
col_dict – instantiate this table using the supplied mapping of column names and array-like objects. Supported input types are: mapping of strings to LGDOCollections, pd.DataFrame and ak.Array. Note 1: no copy is performed, the objects are used directly (unless ak.Array is provided). Note 2: if size is not None, all arrays will be resized to match it. Note 3: if the arrays have different lengths, all will be resized to match the length of the first array.
attrs – A set of user attributes to be carried along with this LGDO.

Notes

the loc attribute is initialized to 0.

add_column(name, obj, use_obj_size=False)¶

Alias for add_field() using table terminology ‘column’.

add_field(name, obj, use_obj_size=False)¶

Add a field (column) to the table.

Use the name “field” here to match the terminology used in Struct.

Parameters:

name (str) – the name for the field in the table.
obj (LGDOCollection) – the object to be added to the table.
use_obj_size (bool) – if True, resize the table to match the length of obj.

datatype_name()¶

The name for this LGDO’s datatype attribute.

Return type:: str

eval(expr, parameters=None, modules=None)¶

Apply column operations to the table and return a new LGDO.

Internally uses numexpr.evaluate() if dealing with columns representable as NumPy arrays or eval() if VectorOfVectors are involved. In the latter case, the VoV columns are viewed as ak.Array and the respective routines are therefore available.

To columns nested in subtables can be accessed by scoping with two underscores (__). For example:

tbl.eval("a + tbl2__b")

computes the sum of column a and column b in the subtable tbl2.

Parameters:

expr (str) – if the expression only involves non-VectorOfVectors columns, the syntax is the one supported by numexpr.evaluate() (see here for documentation). Note: because of internal limitations, reduction operations must appear the last in the stack. If at least one considered column is a VectorOfVectors, plain eval() is used and ak.Array transforms can be used through the ak. prefix. (NumPy functions are analogously accessible through np.). See also examples below.
parameters (Mapping[str, str] | None) – a dictionary of function parameters. Passed to numexpr.evaluate`() as local_dict argument or to eval() as locals argument.
modules (Mapping[str, ModuleType] | None) – a dictionary of additional modules used by the expression. If this is not None then eval`is used and the expression can depend on any modules from this dictionary in addition to awkward and numpy. These are passed to :func:`eval() as globals argument.

Return type:

LGDO

Examples

>>> import lgdo
>>> tbl = lgdo.Table(
...   col_dict={
...     "a": lgdo.Array([1, 2, 3]),
...     "b": lgdo.VectorOfVectors([[5], [6, 7], [8, 9, 0]]),
...   }
... )
>>> print(tbl.eval("a + b"))
[[6],
 [8 9],
 [11 12  3],
]
>>> print(tbl.eval("np.sum(a) + ak.sum(b)"))
41

flatten(_prefix='')¶

Flatten the table, if nested.

Returns a new Table (that references, not copies, the existing columns) with columns in nested tables being moved to the first level (and renamed appropriately).

Examples

>>> repr(tbl)
"Table(dict={'a': Array([1 2 3], attrs={'datatype': 'array<1>{real}'}), 'tbl': Table(dict={'b': Array([4 5 6], attrs={'datatype': 'array<1>{real}'}), 'tbl1': Table(dict={'z': Array([9 9 9], attrs={'datatype': 'array<1>{real}'})}, attrs={'datatype': 'table{z}'})}, attrs={'datatype': 'table{b,tbl1}'})}, attrs={'datatype': 'table{a,tbl}'})"
>>> tbl.flatten().keys()
dict_keys(['a', 'tbl__b', 'tbl__tbl1__z'])

Return type:: Table

get_capacity()¶

Get list of capacities for each key

Return type:: int

get_dataframe(cols=None, copy=False, prefix='')¶

Get a pandas.DataFrame from the data in the table.

Warning

This method is deprecated. Use view_as() to view the table as a Pandas dataframe.

Notes

The requested data must be array-like, with the nda attribute.

Parameters:

cols (list[str] | None) – a list of column names specifying the subset of the table’s columns to be added to the dataframe.
copy (bool) – When True, the dataframe allocates new memory and copies data into it. Otherwise, the raw nda’s from the table are used directly.
prefix (str) – The prefix to be added to the column names. Used when recursively getting the dataframe of a Table inside this Table

Return type:

DataFrame

insert(i, vals)¶

Insert vals into table at row i. Vals is a mapping from table key to val

join(other_table, cols=None, do_warn=True)¶

Add the columns of another table to this table.

Notes

Following the join, both tables have access to other_table’s fields (but other_table doesn’t have access to this table’s fields). No memory is allocated in this process. other_table can go out of scope and this table will retain access to the joined data.

Parameters:

other_table (Table) – the table whose columns are to be joined into this table.
cols (list[str] | None) – a list of names of columns from other_table to be joined into this table.
do_warn (bool) – set to False to turn off warnings associated with mismatched loc parameter or add_column() warnings.

remove_column(name, delete=False)¶

Alias for Struct.remove_field() using table terminology ‘column’.

reserve_capacity(capacity)¶

Set size (number of rows) of internal memory buffer

resize(new_size=None, do_warn=False, trim=False)¶

Return this LGDO’s datatype attribute string.

trim_capacity()¶

Set capacity to be minimum needed to support Array size

Return type:: int

view_as(library, with_units=False, cols=None, prefix='')¶

View the Table data as a third-party format data structure.

This is typically a zero-copy or nearly zero-copy operation.

Supported third-party formats are:

pd: returns a pandas.DataFrame
ak: returns an ak.Array (record type)

Notes

Conversion to Awkward array only works when the key is a string.

Parameters:

library (str) – format of the returned data view.
with_units (bool) – forward physical units to the output data.
cols (list[str] | None) – a list of column names specifying the subset of the table’s columns to be added to the data view structure.
prefix (str) – The prefix to be added to the column names. Used when recursively getting the dataframe of a Table inside this Table.

Return type:

pd.DataFrame | np.NDArray | ak.Array

See also

LGDO.view_as

lgdo.types.table._ak_to_lgdo_or_col_dict(array)¶

lgdo.types.vectorofvectors module¶

Implements a LEGEND Data Object representing a variable-length array of variable-length arrays and corresponding utilities.

class lgdo.types.vectorofvectors.VectorOfVectors(*_args, **_kwargs)¶

Bases: LGDOCollection

A n-dimensional variable-length 1D array of variable-length 1D arrays.

If the vector is 2-dimensional, the internal representation is as two NumPy arrays, one to store the flattened data contiguosly (flattened_data) and one to store the cumulative sum of lengths of each vector (cumulative_length). When the dimension is more than 2, flattened_data is a VectorOfVectors itself.

Examples

>>> from lgdo import VectorOfVectors
>>> data = VectorOfVectors(
...   [[[1, 2], [3, 4, 5]], [[2], [4, 8, 9, 7]], [[5, 3, 1]]],
...   attrs={"units": "m"}
... )
>>> print(data)
[[[1, 2], [3, 4, 5]],
 [[2], [4, 8, 9, 7]],
 [[5, 3, 1]]
] with attrs={'units': 'm'}
>>> data.view_as("ak")
<Array [[[1, 2], [3, 4, 5]], ..., [[5, ..., 1]]] type='3 * var * var * int64'>

Note

Many class methods are currently implemented only for 2D vectors and will raise an exception on higher dimensional data.

Parameters:

data (ArrayLike | None) – Any array-like structure accepted by the ak.Array constructor, with the exception that elements cannot be of type OptionType, UnionType or RecordType. Takes priority over flattened_data and cumulative_length. The serialization of the ak.Array is performed through ak.to_buffers(). Since the latter returns non-data-owning NumPy arrays, which would prevent later modifications like resizing, a copy is performed.
flattened_data (ArrayLike | None) – if not None, used as the internal array for self.flattened_data. Otherwise, an internal flattened_data is allocated based on cumulative_length (or shape_guess) and dtype.
cumulative_length (ArrayLike | VectorOfVectors | None) – if not None, used as the internal array for self.cumulative_length. Should be dtype numpy.uint32. If cumulative_length is None, an internal cumulative_length is allocated based on the first element of shape_guess.
shape_guess (Sequence[int, ...] | None) – a NumPy-format shape specification, required if either of flattened_data or cumulative_length are not supplied. The first element should not be a guess and sets the number of vectors to be stored. The second element is a guess or approximation of the typical length of a stored vector, used to set the initial length of flattened_data if it was not supplied.
dtype (DTypeLike | None) – sets the type of data stored in flattened_data. Required if flattened_data and array are None.
fill_val (int | float | None) – fill all of self.flattened_data with this value.
attrs (Mapping[str, Any] | None) – a set of user attributes to be carried along with this LGDO.

_set_vector_unsafe(i, vec, lens=None)¶

Insert vector vec at position i.

Assumes that j = self.cumulative_length[i-1] is the index (in self.flattened_data) of the end of the (i-1)th vector and copies vec in self.flattened_data[j:sum(lens)]. Finally updates self.cumulative_length[i] with the new flattened data array length.

Vectors stored after index i can be overridden, producing unintended behavior. This method is typically used for fast sequential fill of a pre-allocated vector of vectors.

If i`vec` is 1D array and lens is None, set using full array. If vec is 2D, require lens to be included, and fill each array only up to lengths in lens.

Danger

This method can lead to undefined behavior or vector invalidation if used improperly. Use it only if you know what you are doing.

See also

append, replace, insert

append(new)¶

Append a 1D vector new at the end.

Examples

>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]])
>>> vov.append([8, 9])
>>> print(vov)
[[1 2 3],
 [4 5],
 [8 9],
]

datatype_name()¶

The name for this LGDO’s datatype attribute.

Return type:: str

property dtype: dtype¶

form_datatype()¶

Return this LGDO’s datatype attribute string.

Return type:: str

get_capacity()¶

Get tuple containing capacity of each dimension. First dimension is cumulative length array. Last dimension is flattened data.

Return type:: tuple[int]

insert(i, new)¶

Insert a vector at index i.

self.flattened_data (and therefore self.cumulative_length) is resized in order to accommodate the new element.

Examples

>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]])
>>> vov.insert(1, [8, 9])
>>> print(vov)
[[1 2 3],
 [8 9],
 [4 5],
]

property ndim¶

replace(i, new)¶

Replace the vector at index i with new.

self.flattened_data (and therefore self.cumulative_length) is resized, if the length of new is different from the vector currently at index i.

Examples

>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]])
>>> vov.replace(0, [8, 9])
>>> print(vov)
[[8 9],
 [4 5],
]

reserve_capacity(cap_cl, *cap_args)¶

Set capacity of internal data arrays. Expect number of args to equal self.n_dim. First arg is capacity of cumulative length array. If self.n_dim is 2, second argument is capacity of flattened data, otherwise arguments are fed recursively to remaining dimensions.

resize(new_size, trim=False)¶

Resize vector along the first axis.

self.flattened_data is resized only if new_size is smaller than the current vector length.

If new_size is larger than the current vector length, self.cumulative_length is padded with its last element. This corresponds to appending empty vectors.

If trim is True, resize capacity to match new size

Examples

>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]])
>>> vov.resize(3)
>>> print(vov)
[[1 2 3],
 [4 5],
 [],
]

>>> vov = VectorOfVectors([[1, 2], [3], [4, 5]])
>>> vov.resize(2)
>>> print(vov)
[[1 2],
 [3],
]

to_aoesa(max_len=None, fill_val=nan, preserve_dtype=False)¶

Convert to ArrayOfEqualSizedArrays.

Note

The dtype of the original vector is typically not strictly preserved. The output dtype will be either np.float64 or np.int64. If you want to use the same exact dtype, set preserve_dtype to True.

Parameters:

max_len (int | None) – the length of the returned array along its second dimension. Longer vectors will be truncated, shorter will be padded with fill_val. If None, the length will be equal to the length of the longest vector.
fill_val (bool | int | float) – value used to pad shorter vectors up to max_len. The dtype of the output array will be such that both fill_val and the vector values can be represented in the same data structure.
preserve_dtype (bool) – whether the output array should have exactly the same dtype as the original vector of vectors. The type fill_val must be a compatible one.

Return type:

ArrayOfEqualSizedArrays

trim_capacity()¶

Set capacity for all dimensions to minimum needed to hold data

view_as(library, with_units=False, fill_val=nan, preserve_dtype=False)¶

View the vector data as a third-party format data structure.

This is typically a zero-copy or nearly zero-copy operation.

Supported third-party formats are:

pd: returns a pandas.Series (supported through the awkward-pandas package)
np: returns a numpy.ndarray, padded with zeros to make it rectangular. This implies memory re-allocation.
ak: returns an ak.Array. self.cumulative_length is currently re-allocated for technical reasons.

Notes

Awkward array views partially involve memory re-allocation (the cumulative_lengths), while NumPy “exploded” views clearly imply a full copy.

Parameters:

library (str) – format of the returned data view.
with_units (bool) – forward physical units to the output data.
fill_val (bool | int | float) – forwarded to to_aoesa(), if library is np.
preserve_dtype (bool) – forwarded to to_aoesa(), if library is np.

Return type:

pd.DataFrame | np.NDArray | ak.Array

See also

LGDO.view_as

@numba.jit lgdo.types.vectorofvectors._to_aoesa(flattened_array, cumulative_length, nda)¶

lgdo.types.vovutils module¶

VectorOfVectors utilities.

lgdo.types.vovutils._ak_is_jagged(type_)¶

Returns True if ak.Array is jagged at all axes.

This assures that ak.to_buffers() returns the expected data structures.

Return type:: bool

lgdo.types.vovutils._ak_is_valid(type_)¶

Returns True if ak.Array contains only elements we can serialize to LH5.

Return type:: bool

@numba.jit lgdo.types.vovutils._nb_build_cl(sorted_array_in, cumulative_length_out)¶

numbified inner loop for build_cl

Return type:: ndarray[tuple[int, …], dtype[_ScalarType_co]]

@numba.jit lgdo.types.vovutils._nb_explode(cumulative_length, array_in, array_out)¶

Numbified inner loop for explode().

Return type:: ndarray[tuple[int, …], dtype[_ScalarType_co]]

@numba.jit lgdo.types.vovutils._nb_explode_cl(cumulative_length, array_out)¶

numbified inner loop for explode_cl

Return type:: ndarray[tuple[int, …], dtype[_ScalarType_co]]

@numba.guvectorize lgdo.types.vovutils._nb_fill(aoa_in, len_in, nan_val, flattened_array_out)¶

Options: boundscheck=False, cache=True
Precompiled signatures: ?i??->, ?l??->, ?I??->, ?L??->, bibb->, blbb->, bIbb->, bLbb->, hihh->, hlhh->, hIhh->, hLhh->, iiii->, ilii->, iIii->, iLii->, lill->, llll->, lIll->, lLll->, BiBB->, BlBB->, BIBB->, BLBB->, HiHH->, HlHH->, HIHH->, HLHH->, IiII->, IlII->, IIII->, ILII->, LiLL->, LlLL->, LILL->, LLLL->, fiff->, flff->, fIff->, fLff->, didd->, dldd->, dIdd->, dLdd->, FiFF->, FlFF->, FIFF->, FLFF->, DiDD->, DlDD->, DIDD->, DLDD->

Vectorized function to fill flattened array from array of arrays and lengths. Values in aoa_in past lengths will not be copied.

Parameters:

aoa_in (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – array of arrays containing values to be copied
len_in (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – array of vector lengths for each row of aoa_in
nan_val (int | float) – value to use when len_in is longer than aoa_in. Should use np.nan for floating point, and 0xfff… for integer types
flattened_array_out (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – flattened array to copy values into. Must be longer than sum of lengths in len_in

lgdo.types.vovutils.build_cl(sorted_array_in, cumulative_length_out=None)¶

Build a cumulative length array from an array of sorted data.

Examples

>>> build_cl(np.array([3, 3, 3, 4])
array([3., 4.])

For a sorted_array_in of indices, this is the inverse of explode_cl(), in the sense that doing build_cl(explode_cl(cumulative_length)) would recover the original cumulative_length.

Parameters:

sorted_array_in (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – array of data already sorted; each N matching contiguous entries will be converted into a new row of cumulative_length_out.
cumulative_length_out (ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None) – a pre-allocated array for the output cumulative_length. It will always have length <= sorted_array_in, so giving them the same length is safe if there is not a better guess.

Returns:

cumulative_length_out – the output cumulative length array. If the user provides a cumulative_length_out that is too long, this return value is sliced to contain only the used portion of the allocated memory.

Return type:

ndarray[tuple[int, …], dtype[_ScalarType_co]]

lgdo.types.vovutils.explode(cumulative_length, array_in, array_out=None)¶

Explode a data array using a cumulative_length array.

This is identical to explode_cl(), except array_in gets exploded instead of cumulative_length.

Examples

>>> explode(np.array([2, 3]), np.array([3, 4]))
array([3., 3., 4.])

Parameters:

cumulative_length (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – the cumulative length array to use for exploding.
array_in (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – the data to be exploded. Must have same length as cumulative_length.
array_out (ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None) – a pre-allocated array to hold the exploded data. The length should be equal to cumulative_length[-1].

Returns:

array_out – the exploded cumulative length array.

Return type:

ndarray[tuple[int, …], dtype[_ScalarType_co]]

lgdo.types.vovutils.explode_arrays(cumulative_length, arrays, arrays_out=None)¶

Explode a set of arrays using a cumulative_length array.

Parameters:

cumulative_length (Array) – the cumulative length array to use for exploding.
arrays (Sequence[ndarray[tuple[int, ...], dtype[_ScalarType_co]]]) – the data arrays to be exploded. Each array must have same length as cumulative_length.
arrays_out (Sequence[ndarray[tuple[int, ...], dtype[_ScalarType_co]]] | None) – a list of pre-allocated arrays to hold the exploded data. The length of the list should be equal to the length of arrays, and each entry in arrays_out should have length cumulative_length[-1]. If not provided, output arrays are allocated for the user.

Returns:

arrays_out – the list of exploded cumulative length arrays.

Return type:

list

lgdo.types.vovutils.explode_cl(cumulative_length, array_out=None)¶

Explode a cumulative_length array.

Examples

>>> explode_cl(np.array([2, 3]))
array([0., 0., 1.])

This is the inverse of build_cl(), in the sense that doing build_cl(explode_cl(cumulative_length)) would recover the original cumulative_length.

Parameters:

cumulative_length (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – the cumulative length array to be exploded.
array_out (ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None) – a pre-allocated array to hold the exploded cumulative length array. The length should be equal to cumulative_length[-1].

Returns:

array_out – the exploded cumulative length array.

Return type:

ndarray[tuple[int, …], dtype[_ScalarType_co]]

lgdo.types.waveformtable module¶

Implements a LEGEND Data Object representing a special Table to store blocks of one-dimensional time-series data.

class lgdo.types.waveformtable.WaveformTable(*args, **kwargs)¶

Bases: Table

An LGDO for storing blocks of (1D) time-series data.

A WaveformTable is an LGDO Table with the 3 columns t0, dt, and values:

t0[i] is a time offset (relative to a user-defined global reference) for the sample in values[i][0]. Implemented as an LGDO Array with optional attribute units.
dt[i] is the sampling period for the waveform at values[i]. Implemented as an LGDO Array with optional attribute units.
values[i] is the i’th waveform in the table. Internally, the waveforms values may be either an LGDO ArrayOfEqualSizedArrays<1,1>, an LGDO VectorOfVectors or VectorOfEncodedVectors that supports waveforms of unequal length. Can optionally be given a units attribute.

Note

On-disk and in-memory versions could be different e.g. if a compression routine is used.

Parameters:

size (int | None) – sets the number of rows in the table. If None, the size will be determined from the first among t0, dt, or values to return a valid length. If not None, t0, dt, and values will be resized as necessary to match size. If size is None and t0, dt, and values are all non-array-like, a default size of 1024 is used.
t0 (float | Array | np.ndarray) – \(t_0\) values to be used (or broadcast) to the t0 column.
t0_units (str | None) – units for the \(t_0\) values. If not None and t0 is an LGDO Array, overrides what’s in t0.
dt (float | Array | np.ndarray) – \(\delta t\) values (sampling period) to be used (or broadcasted) to the t0 column.
dt_units (str | None) – units for the dt values. If not None and dt is an LGDO Array, overrides what’s in dt.
values (ArrayOfEqualSizedArrays | VectorOfVectors | np.ndarray) – The waveform data to be stored in the table. If None a block of data is prepared based on the wf_len and dtype arguments.
values_units (str | None) – units for the waveform values. If not None and values is an LGDO Array, overrides what’s in values.
wf_len (int | None) – The length of the waveforms in each entry of a table. If None (the default), unequal lengths are assumed and VectorOfVectors is used for the values column. Ignored if values is a 2D ndarray, in which case values.shape[1] is used.
dtype (np.dtype) – The NumPy numpy.dtype of the waveform data. If values is not None, this argument is ignored. If both values and dtype are None, numpy.float64 is used.
attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO.

property dt: Array¶

property dt_units: str¶

resize_wf_len(new_len)¶

Alias for wf_len.setter, for when we want to make it clear in the code that memory is being reallocated.

property t0: Array¶

property t0_units: str¶

property values: ArrayOfEqualSizedArrays | VectorOfVectors¶

property values_units: str¶

view_as(library, with_units=False, cols=None, prefix='')¶

View the waveform data as a third-party format data structure.

See also

LGDO.view_as

Return type:: pd.DataFrame | np.NDArray | ak.Array

property wf_len: int¶