lgdo.types package¶
LEGEND Data Objects (LGDO) types.
Submodules¶
lgdo.types.array module¶
Implements a LEGEND Data Object representing an n-dimensional array and corresponding utilities.
- class lgdo.types.array.Array(*_args, **_kwargs)¶
Bases:
LGDOHolds an
numpy.ndarrayand attributes.Array(and the other various array types) holds an nda instead of deriving fromnumpy.ndarrayfor the following reasons:It keeps management of the nda totally under the control of the user. The user can point it to another object’s buffer, grab the nda and toss the
Array, etc.It allows the management code to send just the nda’s the central routines for data manpulation. Keeping LGDO’s out of that code allows for more standard, reusable, and (we expect) performant Python.
It allows the first axis of the nda to be treated as “special” for storage in
Tables.
- Parameters:
nda (np.ndarray) – An
numpy.ndarrayto be used for this object’s internal array. Note: the array is used directly, not copied. If not supplied, internal memory is newly allocated based on the shape and dtype arguments.shape (tuple[int, ...]) – A numpy-format shape specification for shape of the internal ndarray. Required if nda is
None, otherwise unused.dtype (np.dtype) – Specifies the type of the data in the array. Required if nda is
None, otherwise unused.fill_val (float | int | None) – If
None, memory is allocated without initialization. Otherwise, the array is allocated with all elements set to the corresponding fill value. If nda is notNone, this parameter is ignored.attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO.
- append(value)¶
- insert(i, value)¶
- resize(new_size)¶
- view_as(library, with_units=False)¶
View the Array data as a third-party format data structure.
This is a zero-copy operation. Supported third-party formats are:
pd: returns apandas.Seriesnp: returns the internal nda attribute (numpy.ndarray)ak: returns anak.Arrayinitialized with self.nda
- Parameters:
- Return type:
pd.DataFrame | np.NDArray | ak.Array
See also
lgdo.types.arrayofequalsizedarrays module¶
Implements a LEGEND Data Object representing an array of equal-sized arrays and corresponding utilities.
- class lgdo.types.arrayofequalsizedarrays.ArrayOfEqualSizedArrays(*_args, **_kwargs)¶
Bases:
ArrayAn array of equal-sized arrays.
Arrays of equal size within a file but could be different from application to application. Canonical example: array of same-length waveforms.
- Parameters:
dims (tuple[int, ...] | None) – specifies the dimensions required for building the
ArrayOfEqualSizedArrays’ datatype attribute.nda (np.ndarray) – An
numpy.ndarrayto be used for this object’s internal array. Note: the array is used directly, not copied. If not supplied, internal memory is newly allocated based on the shape and dtype arguments.shape (tuple[int, ...]) – A NumPy-format shape specification for shape of the internal array. Required if nda is
None, otherwise unused.dtype (np.dtype) – Specifies the type of the data in the array. Required if nda is
None, otherwise unused.fill_val (int | float | None) – If
None, memory is allocated without initialization. Otherwise, the array is allocated with all elements set to the corresponding fill value. If nda is notNone, this parameter is ignored.attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO.
Notes
If shape is not “1D array of arrays of shape given by axes 1-N” (of nda) then specify the dimensionality split in the constructor.
See also
- to_vov(cumulative_length=None)¶
Convert (and eventually resize) to
vectorofvectors.VectorOfVectors.- Parameters:
cumulative_length (ndarray) – cumulative length array of the output vector of vectors. Each vector in the output is filled with values found in the
ArrayOfEqualSizedArrays, starting from the first index. ifNone, use all of the original 2D array and make vectors of equal size.- Return type:
lgdo.types.encoded module¶
- class lgdo.types.encoded.ArrayOfEncodedEqualSizedArrays(*_args, **_kwargs)¶
Bases:
LGDOAn array of encoded arrays with equal decoded size.
Used to represent an encoded
ArrayOfEqualSizedArrays. In addition to an internalVectorOfVectorsself.encoded_data storing the encoded data, the size of the decoded arrays is stored in aScalarself.encoded_size.See also
- Parameters:
encoded_data (VectorOfVectors) – the vector of vectors holding the encoded data.
decoded_size (Scalar | int) – the length of the decoded arrays.
attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO. Should include information about the codec used to encode the data.
- append(value)¶
Append a 1D encoded array at the end.
See also
- insert(i, value)¶
Insert an encoded array at index i.
See also
- replace(i, value)¶
Replace the encoded array at index i with a new one.
See also
- resize(new_size)¶
Resize array along the first axis.
See also
- view_as(library, with_units=False)¶
View the encoded data as a third-party format data structure.
This is nearly a zero-copy operation.
Supported third-party formats are:
pd: returns apandas.DataFrameak: returns anak.Array(record type)
Note
In the view, decoded_size is expanded into an array.
- Parameters:
- Return type:
pd.DataFrame | np.NDArray | ak.Array
See also
- class lgdo.types.encoded.VectorOfEncodedVectors(*_args, **_kwargs)¶
Bases:
LGDOAn array of variable-length encoded arrays.
Used to represent an encoded
VectorOfVectors. In addition to an internalVectorOfVectorsself.encoded_data storing the encoded data, a 1DArrayin self.encoded_size holds the original sizes of the encoded vectors.See also
- Parameters:
encoded_data (VectorOfVectors) – the vector of encoded vectors.
decoded_size (Array) – an array holding the original length of each encoded vector in encoded_data.
attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO. Should include information about the codec used to encode the data.
- append(value)¶
Append a 1D encoded vector at the end.
- Parameters:
value (tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], int]) – a tuple holding the encoded array and its decoded size.
See also
- insert(i, value)¶
Insert an encoded vector at index i.
- Parameters:
See also
- replace(i, value)¶
Replace the encoded vector (and decoded size) at index i with a new one.
- Parameters:
See also
- resize(new_size)¶
Resize vector along the first axis.
See also
- view_as(library, with_units=False)¶
View the encoded data as a third-party format data structure.
This is a zero-copy or nearly zero-copy operation.
Supported third-party formats are:
pd: returns apandas.DataFrameak: returns anak.Array(record type)
- Parameters:
- Return type:
pd.DataFrame | np.NDArray | ak.Array
See also
lgdo.types.fixedsizearray module¶
Implements a LEGEND Data Object representing an n-dimensional array of fixed size and corresponding utilities.
- class lgdo.types.fixedsizearray.FixedSizeArray(*_args, **_kwargs)¶
Bases:
ArrayAn array of fixed-size arrays.
Arrays with guaranteed shape along axes > 0: for example, an array of vectors will always length 3 on axis 1, and it will never change from application to application. This data type is used for optimized memory handling on some platforms. We are not that sophisticated so we are just storing this identification for LGDO validity, i.e. for now this class is just an alias for
Array, but keeps track of the datatype name.See also
- view_as(library, with_units=False)¶
View the array as a third-party format data structure.
See also
lgdo.types.histogram module¶
- class lgdo.types.histogram.Histogram(*_args, **_kwargs)¶
Bases:
StructA special struct to contain histogrammed data.
- Parameters:
weights (hist.Hist | NDArray | Array) – An
numpy.ndarrayto be used for this object’s internal array, or ahist.Histobject, whose data view is used for this object’s internal array. Note: the array/histogram view is used directly, not copiedbinning (None | Iterable[Histogram.Axis] | Iterable[NDArray] | Iterable[tuple[float, float, float]]) –
has to by None if a
hist.Histhas been passed asweightscan be a list of pre-initialized
Histogram.Axiscan be a list of tuples, each representing a range,
(first, last, step)can be a list of numpy arrays, as returned by
numpy.histogramdd().
isdensity (bool) – If True, all bin contents represent a density (amount per volume), and not an absolute amount.
binedge_attrs (dict[str, Any] | None) – attributes that will be added to the all
binedgesof all axes. This does not work ifHistogram.Axisinstances are directly passed as binning.attrs (dict[str, Any] | None) – a set of user attributes to be carried along with this LGDO.
flow (bool) –
If
False, discard counts in over-/underflow bins of the passedhist.Histinstance. IfTrue, this data will also be discarded, but a warning is emitted.Note
Histogramdoes not support storing counts in overflow or underflow bins. This parameter just controls, whether a warning will be emitted.
- class Axis(*_args, **_kwargs)¶
Bases:
StructA special struct to group axis parameters for use in a
Histogram.Depending on the parameters, an axis either can have
a binning described by a range object, if
first,lastandstepare passed, ora variable binning described by the
edgesarray.
- Parameters:
edges (NDArray | Array | None) – an array of edges that describe the binning of this axis.
first (float | None) – left edge of the leftmost bin
last (float | None) – right edge of the rightmost bin
step (float | None) – step size (width of each bin)
closedleft (bool) – if True, the bin intervals are left-closed \([a,b)\); if False, intervals are right-closed \((a,b]\).
binedge_attrs (dict[str, Any] | None) – attributes that will be added to the
binedgesLGDO that is part of the axis struct.
- property edges: ndarray[tuple[int, ...], dtype[_ScalarType_co]]¶
Return all binedges, both for variable and range binning.
- classmethod from_edges(edges, binedge_attrs=None)¶
Create a new axis with variable binning described by
edges.- Return type:
- classmethod from_range_edges(edges, binedge_attrs=None)¶
Create a new axis from the binning described by
edges, but try to convert it to a evenly-spaced range object first.Warning
This function might return a wrong binning, especially in the case of very small magnitudes of the spacing. See the documentation of
numpy.isclose()for details. Use this function only with caution, if you know the binning’s order of magniutude.- Return type:
- get_binedgeattrs(datatype=False)¶
Return a copy of the LGDO attributes dictionary of the binedges
- add_field(name, obj)¶
Error
Not applicable: A histogram cannot be used as a struct
- fill(data, w=None, keys=None)¶
Fill histogram by incrementing bins with data points weighted by w
- Parameters:
data – a ndarray with inner dimension equal to number of axes, or a list of equal-length 1d-arrays containing data for each axis, or a Mapping to 1d-arrays containing data for each axis (requires keys), or a Pandas dataframe (optionally takes a list of keys)
w (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – weight to use for incrementing data points. If None, use 1 for all
keys (Sequence[str]) – list of keys to use if data is a pandas ‘’DataFrame’’ or ‘’Mapping’’
- remove_field(name, delete=False)¶
Error
Not applicable: A histogram cannot be used as a struct
- view_as(library)¶
View the histogram data as a third-party format data structure.
This is typically a zero-copy or nearly zero-copy operation.
Supported third-party formats are:
np: returns a tuple of binning and annp.ndarray, similar to the return value ofnumpy.histogramdd().hist: returns anhist.Histthat holds a copy of this histogram’s data.
Warning
Viewing as
histwill perform a copy of the stored histogram data.- Parameters:
library (str) – format of the returned data view.
- Return type:
See also
lgdo.types.lgdo module¶
- class lgdo.types.lgdo.LGDO(*_args, **_kwargs)¶
Bases:
ABCAbstract base class representing a LEGEND Data Object (LGDO).
- getattrs(datatype=False)¶
Return a copy of the LGDO attributes dictionary.
- abstract view_as(library, with_units=False)¶
View the LGDO data object as a third-party format data structure.
This is typically a zero-copy or nearly zero-copy operation unless explicitly stated in the concrete LGDO documentation. The view can be turned into a copy explicitly by the user with the appropriate methods. If requested by the user, the output format supports it and the LGDO carries a
unitsattribute, physical units are attached to the view through thepintpackage.Typical supported third-party libraries are:
Note
Awkward does not support attaching units through Pint, at the moment.
but the actual supported formats may vary depending on the concrete LGDO class.
lgdo.types.scalar module¶
Implements a LEGEND Data Object representing a scalar and corresponding utilities.
lgdo.types.struct module¶
Implements a LEGEND Data Object representing a struct and corresponding utilities.
- class lgdo.types.struct.Struct(*_args, **_kwargs)¶
-
A dictionary of LGDO’s with an optional set of attributes.
After instantiation, add fields using
add_field()to keep the datatype updated, or callupdate_datatype()after adding.- Parameters:
- add_field(name, obj)¶
Add a field to the table.
- remove_field(name, delete=False)¶
Remove a field from the table.
- Parameters:
delete (bool) – if
True, delete the field object by callingThe del statement.
- update_datatype()¶
- view_as()¶
View the Struct data as a third-party format data structure.
Error
Not implemented. Since Struct’s fields can have different lengths, converting to a NumPy, Pandas or Awkward is generally not possible. Call
LGDO.view_as()on the fields instead.See also
- lgdo.types.struct._is_struct_datatype(dt_name, expr)¶
- lgdo.types.struct._sort_datatype_fields(expr)¶
- lgdo.types.struct._struct_datatype_equal(dt_name, dt1, dt2)¶
lgdo.types.table module¶
Implements a LEGEND Data Object representing a special struct of arrays of equal length and corresponding utilities.
- class lgdo.types.table.Table(*args, **kwargs)¶
Bases:
StructA special struct of arrays or subtable columns of equal length.
Holds onto an internal read/write location
locthat is useful in managing table I/O using functions likepush_row(),is_full(), andclear().Note
If you write to a table and don’t fill it up to its total size, be sure to resize it before passing to data processing functions, as they will call
__len__()to access valid data, which returns thesizeattribute.- Parameters:
size – sets the number of rows in the table.
Arrays in col_dict will be resized to match size if both are not ``None`. If size is left asNone, the number of table rows is determined from the length of the first array in col_dict. If neither is provided, a default length of 1024 is used.col_dict – instantiate this table using the supplied mapping of column names and array-like objects. Supported input types are: mapping of strings to LGDOs,
pd.DataFrameandak.Array. Note 1: no copy is performed, the objects are used directly (unlessak.Arrayis provided). Note 2: if size is notNone, all arrays will be resized to match it. Note 3: if the arrays have different lengths, all will be resized to match the length of the first array.attrs – A set of user attributes to be carried along with this LGDO.
Notes
the
locattribute is initialized to 0.- add_column(name, obj, use_obj_size=False)¶
Alias for
add_field()using table terminology ‘column’.
- add_field(name, obj, use_obj_size=False)¶
Add a field (column) to the table.
Use the name “field” here to match the terminology used in
Struct.
- clear() None. Remove all items from D.¶
- eval(expr, parameters=None, modules=None)¶
Apply column operations to the table and return a new LGDO.
Internally uses
numexpr.evaluate()if dealing with columns representable as NumPy arrays oreval()ifVectorOfVectorsare involved. In the latter case, the VoV columns are viewed asak.Arrayand the respective routines are therefore available.To columns nested in subtables can be accessed by scoping with two underscores (
__). For example:tbl.eval("a + tbl2__b")
computes the sum of column a and column b in the subtable tbl2.
- Parameters:
expr (str) – if the expression only involves non-
VectorOfVectorscolumns, the syntax is the one supported bynumexpr.evaluate()(see here for documentation). Note: because of internal limitations, reduction operations must appear the last in the stack. If at least one considered column is aVectorOfVectors, plaineval()is used andak.Arraytransforms can be used through theak.prefix. (NumPy functions are analogously accessible throughnp.). See also examples below.parameters (Mapping[str, str] | None) – a dictionary of function parameters. Passed to
numexpr.evaluate`()as local_dict argument or toeval()as locals argument.modules (Mapping[str, ModuleType] | None) – a dictionary of additional modules used by the expression. If this is not None then
eval`is used and the expression can depend on any modules from this dictionary in addition to awkward and numpy. These are passed to :func:`eval()as globals argument.
- Return type:
Examples
>>> import lgdo >>> tbl = lgdo.Table( ... col_dict={ ... "a": lgdo.Array([1, 2, 3]), ... "b": lgdo.VectorOfVectors([[5], [6, 7], [8, 9, 0]]), ... } ... ) >>> print(tbl.eval("a + b")) [[6], [8 9], [11 12 3], ] >>> print(tbl.eval("np.sum(a) + ak.sum(b)")) 41
- flatten(_prefix='')¶
Flatten the table, if nested.
Returns a new
Table(that references, not copies, the existing columns) with columns in nested tables being moved to the first level (and renamed appropriately).Examples
>>> repr(tbl) "Table(dict={'a': Array([1 2 3], attrs={'datatype': 'array<1>{real}'}), 'tbl': Table(dict={'b': Array([4 5 6], attrs={'datatype': 'array<1>{real}'}), 'tbl1': Table(dict={'z': Array([9 9 9], attrs={'datatype': 'array<1>{real}'})}, attrs={'datatype': 'table{z}'})}, attrs={'datatype': 'table{b,tbl1}'})}, attrs={'datatype': 'table{a,tbl}'})" >>> tbl.flatten().keys() dict_keys(['a', 'tbl__b', 'tbl__tbl1__z'])
- Return type:
- get_dataframe(cols=None, copy=False, prefix='')¶
Get a
pandas.DataFramefrom the data in the table.Warning
This method is deprecated. Use
view_as()to view the table as a Pandas dataframe.Notes
The requested data must be array-like, with the
ndaattribute.- Parameters:
cols (list[str] | None) – a list of column names specifying the subset of the table’s columns to be added to the dataframe.
copy (bool) – When
True, the dataframe allocates new memory and copies data into it. Otherwise, the rawnda’s from the table are used directly.prefix (str) – The prefix to be added to the column names. Used when recursively getting the dataframe of a Table inside this Table
- Return type:
- join(other_table, cols=None, do_warn=True)¶
Add the columns of another table to this table.
Notes
Following the join, both tables have access to other_table’s fields (but other_table doesn’t have access to this table’s fields). No memory is allocated in this process. other_table can go out of scope and this table will retain access to the joined data.
- Parameters:
other_table (Table) – the table whose columns are to be joined into this table.
cols (list[str] | None) – a list of names of columns from other_table to be joined into this table.
do_warn (bool) – set to
Falseto turn off warnings associated with mismatched loc parameter oradd_column()warnings.
- push_row()¶
- remove_column(name, delete=False)¶
Alias for
Struct.remove_field()using table terminology ‘column’.
- resize(new_size=None, do_warn=False)¶
- view_as(library, with_units=False, cols=None, prefix='')¶
View the Table data as a third-party format data structure.
This is typically a zero-copy or nearly zero-copy operation.
Supported third-party formats are:
pd: returns apandas.DataFrameak: returns anak.Array(record type)
Notes
Conversion to Awkward array only works when the key is a string.
- Parameters:
library (str) – format of the returned data view.
with_units (bool) – forward physical units to the output data.
cols (list[str] | None) – a list of column names specifying the subset of the table’s columns to be added to the data view structure.
prefix (str) – The prefix to be added to the column names. Used when recursively getting the dataframe of a
Tableinside thisTable.
- Return type:
pd.DataFrame | np.NDArray | ak.Array
See also
- lgdo.types.table._ak_to_lgdo_or_col_dict(array)¶
lgdo.types.vectorofvectors module¶
Implements a LEGEND Data Object representing a variable-length array of variable-length arrays and corresponding utilities.
- class lgdo.types.vectorofvectors.VectorOfVectors(*_args, **_kwargs)¶
Bases:
LGDOA n-dimensional variable-length 1D array of variable-length 1D arrays.
If the vector is 2-dimensional, the internal representation is as two NumPy arrays, one to store the flattened data contiguosly (
flattened_data) and one to store the cumulative sum of lengths of each vector (cumulative_length). When the dimension is more than 2,flattened_datais aVectorOfVectorsitself.Examples
>>> from lgdo import VectorOfVectors >>> data = VectorOfVectors( ... [[[1, 2], [3, 4, 5]], [[2], [4, 8, 9, 7]], [[5, 3, 1]]], ... attrs={"units": "m"} ... ) >>> print(data) [[[1, 2], [3, 4, 5]], [[2], [4, 8, 9, 7]], [[5, 3, 1]] ] with attrs={'units': 'm'} >>> data.view_as("ak") <Array [[[1, 2], [3, 4, 5]], ..., [[5, ..., 1]]] type='3 * var * var * int64'>
Note
Many class methods are currently implemented only for 2D vectors and will raise an exception on higher dimensional data.
- Parameters:
data (ArrayLike | None) – Any array-like structure accepted by the
ak.Arrayconstructor, with the exception that elements cannot be of typeOptionType,UnionTypeorRecordType. Takes priority over flattened_data and cumulative_length. The serialization of theak.Arrayis performed throughak.to_buffers(). Since the latter returns non-data-owning NumPy arrays, which would prevent later modifications like resizing, a copy is performed.flattened_data (ArrayLike | None) – if not
None, used as the internal array for self.flattened_data. Otherwise, an internal flattened_data is allocated based on cumulative_length (or shape_guess) and dtype.cumulative_length (ArrayLike | VectorOfVectors | None) – if not
None, used as the internal array for self.cumulative_length. Should be dtypenumpy.uint32. If cumulative_length isNone, an internal cumulative_length is allocated based on the first element of shape_guess.shape_guess (Sequence[int, ...] | None) – a NumPy-format shape specification, required if either of flattened_data or cumulative_length are not supplied. The first element should not be a guess and sets the number of vectors to be stored. The second element is a guess or approximation of the typical length of a stored vector, used to set the initial length of flattened_data if it was not supplied.
dtype (DTypeLike | None) – sets the type of data stored in flattened_data. Required if flattened_data and array are
None.fill_val (int | float | None) – fill all of self.flattened_data with this value.
attrs (Mapping[str, Any] | None) – a set of user attributes to be carried along with this LGDO.
- _set_vector_unsafe(i, vec, lens=None)¶
Insert vector vec at position i.
Assumes that
j = self.cumulative_length[i-1]is the index (in self.flattened_data) of the end of the (i-1)th vector and copies vec inself.flattened_data[j:sum(lens)]. Finally updatesself.cumulative_length[i]with the new flattened data array length.Vectors stored after index i can be overridden, producing unintended behavior. This method is typically used for fast sequential fill of a pre-allocated vector of vectors.
If i`vec` is 1D array and lens is
None, set using full array. If vec is 2D, require lens to be included, and fill each array only up to lengths in lens.Danger
This method can lead to undefined behavior or vector invalidation if used improperly. Use it only if you know what you are doing.
- append(new)¶
Append a 1D vector new at the end.
Examples
>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]]) >>> vov.append([8, 9]) >>> print(vov) [[1 2 3], [4 5], [8 9], ]
- insert(i, new)¶
Insert a vector at index i.
self.flattened_data (and therefore self.cumulative_length) is resized in order to accommodate the new element.
Examples
>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]]) >>> vov.insert(1, [8, 9]) >>> print(vov) [[1 2 3], [8 9], [4 5], ]
Warning
This method involves a significant amount of memory re-allocation and is expected to perform poorly on large vectors.
- replace(i, new)¶
Replace the vector at index i with new.
self.flattened_data (and therefore self.cumulative_length) is resized, if the length of new is different from the vector currently at index i.
Examples
>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]]) >>> vov.replace(0, [8, 9]) >>> print(vov) [[8 9], [4 5], ]
Warning
This method involves a significant amount of memory re-allocation and is expected to perform poorly on large vectors.
- resize(new_size)¶
Resize vector along the first axis.
self.flattened_data is resized only if new_size is smaller than the current vector length.
If new_size is larger than the current vector length, self.cumulative_length is padded with its last element. This corresponds to appending empty vectors.
Examples
>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]]) >>> vov.resize(3) >>> print(vov) [[1 2 3], [4 5], [], ]
>>> vov = VectorOfVectors([[1, 2], [3], [4, 5]]) >>> vov.resize(2) >>> print(vov) [[1 2], [3], ]
- to_aoesa(max_len=None, fill_val=nan, preserve_dtype=False)¶
Convert to
ArrayOfEqualSizedArrays.Note
The dtype of the original vector is typically not strictly preserved. The output dtype will be either
np.float64ornp.int64. If you want to use the same exact dtype, set preserve_dtype toTrue.- Parameters:
max_len (int | None) – the length of the returned array along its second dimension. Longer vectors will be truncated, shorter will be padded with fill_val. If
None, the length will be equal to the length of the longest vector.fill_val (bool | int | float) – value used to pad shorter vectors up to max_len. The dtype of the output array will be such that both fill_val and the vector values can be represented in the same data structure.
preserve_dtype (bool) – whether the output array should have exactly the same dtype as the original vector of vectors. The type fill_val must be a compatible one.
- Return type:
- view_as(library, with_units=False, fill_val=nan, preserve_dtype=False)¶
View the vector data as a third-party format data structure.
This is typically a zero-copy or nearly zero-copy operation.
Supported third-party formats are:
pd: returns apandas.Series(supported through theawkward-pandaspackage)np: returns anumpy.ndarray, padded with zeros to make it rectangular. This implies memory re-allocation.ak: returns anak.Array.self.cumulative_lengthis currently re-allocated for technical reasons.
Notes
Awkward array views partially involve memory re-allocation (the cumulative_lengths), while NumPy “exploded” views clearly imply a full copy.
- Parameters:
library (str) – format of the returned data view.
with_units (bool) – forward physical units to the output data.
fill_val (bool | int | float) – forwarded to
to_aoesa(), if library isnp.preserve_dtype (bool) – forwarded to
to_aoesa(), if library isnp.
- Return type:
pd.DataFrame | np.NDArray | ak.Array
See also
- @numba.jit lgdo.types.vectorofvectors._to_aoesa(flattened_array, cumulative_length, nda)¶
lgdo.types.vovutils module¶
VectorOfVectors utilities.
- lgdo.types.vovutils._ak_is_jagged(type_)¶
Returns
Trueifak.Arrayis jagged at all axes.This assures that
ak.to_buffers()returns the expected data structures.- Return type:
- lgdo.types.vovutils._ak_is_valid(type_)¶
Returns
Trueifak.Arraycontains only elements we can serialize to LH5.- Return type:
- @numba.jit lgdo.types.vovutils._nb_build_cl(sorted_array_in, cumulative_length_out)¶
numbified inner loop for build_cl
- @numba.jit lgdo.types.vovutils._nb_explode(cumulative_length, array_in, array_out)¶
Numbified inner loop for
explode().
- @numba.jit lgdo.types.vovutils._nb_explode_cl(cumulative_length, array_out)¶
numbified inner loop for explode_cl
- @numba.guvectorize lgdo.types.vovutils._nb_fill(aoa_in, len_in, flattened_array_out)¶
Options:
boundscheck=False,cache=TruePrecompiled signatures:
?i?->,?l?->,?I?->,?L?->,bib->,blb->,bIb->,bLb->,hih->,hlh->,hIh->,hLh->,iii->,ili->,iIi->,iLi->,lil->,lll->,lIl->,lLl->,BiB->,BlB->,BIB->,BLB->,HiH->,HlH->,HIH->,HLH->,IiI->,IlI->,III->,ILI->,LiL->,LlL->,LIL->,LLL->,fif->,flf->,fIf->,fLf->,did->,dld->,dId->,dLd->,FiF->,FlF->,FIF->,FLF->,DiD->,DlD->,DID->,DLD->
Vectorized function to fill flattened array from array of arrays and lengths. Values in aoa_in past lengths will not be copied.
- Parameters:
aoa_in (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – array of arrays containing values to be copied
len_in (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – array of vector lengths for each row of aoa_in
flattened_array_out (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – flattened array to copy values into. Must be longer than sum of lengths in len_in
- lgdo.types.vovutils.build_cl(sorted_array_in, cumulative_length_out=None)¶
Build a cumulative length array from an array of sorted data.
Examples
>>> build_cl(np.array([3, 3, 3, 4]) array([3., 4.])
For a sorted_array_in of indices, this is the inverse of
explode_cl(), in the sense that doingbuild_cl(explode_cl(cumulative_length))would recover the original cumulative_length.- Parameters:
sorted_array_in (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – array of data already sorted; each N matching contiguous entries will be converted into a new row of cumulative_length_out.
cumulative_length_out (ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None) – a pre-allocated array for the output cumulative_length. It will always have length <= sorted_array_in, so giving them the same length is safe if there is not a better guess.
- Returns:
cumulative_length_out – the output cumulative length array. If the user provides a cumulative_length_out that is too long, this return value is sliced to contain only the used portion of the allocated memory.
- Return type:
- lgdo.types.vovutils.explode(cumulative_length, array_in, array_out=None)¶
Explode a data array using a cumulative_length array.
This is identical to
explode_cl(), except array_in gets exploded instead of cumulative_length.Examples
>>> explode(np.array([2, 3]), np.array([3, 4])) array([3., 3., 4.])
- Parameters:
cumulative_length (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – the cumulative length array to use for exploding.
array_in (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – the data to be exploded. Must have same length as cumulative_length.
array_out (ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None) – a pre-allocated array to hold the exploded data. The length should be equal to
cumulative_length[-1].
- Returns:
array_out – the exploded cumulative length array.
- Return type:
- lgdo.types.vovutils.explode_arrays(cumulative_length, arrays, arrays_out=None)¶
Explode a set of arrays using a cumulative_length array.
- Parameters:
cumulative_length (Array) – the cumulative length array to use for exploding.
arrays (Sequence[ndarray[tuple[int, ...], dtype[_ScalarType_co]]]) – the data arrays to be exploded. Each array must have same length as cumulative_length.
arrays_out (Sequence[ndarray[tuple[int, ...], dtype[_ScalarType_co]]] | None) – a list of pre-allocated arrays to hold the exploded data. The length of the list should be equal to the length of arrays, and each entry in arrays_out should have length
cumulative_length[-1]. If not provided, output arrays are allocated for the user.
- Returns:
arrays_out – the list of exploded cumulative length arrays.
- Return type:
- lgdo.types.vovutils.explode_cl(cumulative_length, array_out=None)¶
Explode a cumulative_length array.
Examples
>>> explode_cl(np.array([2, 3])) array([0., 0., 1.])
This is the inverse of
build_cl(), in the sense that doingbuild_cl(explode_cl(cumulative_length))would recover the original cumulative_length.- Parameters:
cumulative_length (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – the cumulative length array to be exploded.
array_out (ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None) – a pre-allocated array to hold the exploded cumulative length array. The length should be equal to
cumulative_length[-1].
- Returns:
array_out – the exploded cumulative length array.
- Return type:
lgdo.types.waveformtable module¶
Implements a LEGEND Data Object representing a special
Table to store blocks of one-dimensional time-series
data.
- class lgdo.types.waveformtable.WaveformTable(*args, **kwargs)¶
Bases:
TableAn LGDO for storing blocks of (1D) time-series data.
A
WaveformTableis an LGDOTablewith the 3 columnst0,dt, andvalues:t0[i]is a time offset (relative to a user-defined global reference) for the sample invalues[i][0]. Implemented as an LGDOArraywith optional attributeunits.dt[i]is the sampling period for the waveform atvalues[i]. Implemented as an LGDOArraywith optional attributeunits.values[i]is thei’th waveform in the table. Internally, the waveforms values may be either an LGDOArrayOfEqualSizedArrays<1,1>, an LGDOVectorOfVectorsorVectorOfEncodedVectorsthat supports waveforms of unequal length. Can optionally be given aunitsattribute.
Note
On-disk and in-memory versions could be different e.g. if a compression routine is used.
- Parameters:
size (int | None) – sets the number of rows in the table. If
None, the size will be determined from the first among t0, dt, or values to return a valid length. If notNone, t0, dt, and values will be resized as necessary to match size. If size isNoneand t0, dt, and values are all non-array-like, a default size of 1024 is used.t0 (float | Array | np.ndarray) – \(t_0\) values to be used (or broadcast) to the t0 column.
t0_units (str | None) – units for the \(t_0\) values. If not
Noneand t0 is an LGDOArray, overrides what’s in t0.dt (float | Array | np.ndarray) – \(\delta t\) values (sampling period) to be used (or broadcasted) to the t0 column.
dt_units (str | None) – units for the dt values. If not
Noneand dt is an LGDOArray, overrides what’s in dt.values (ArrayOfEqualSizedArrays | VectorOfVectors | np.ndarray) – The waveform data to be stored in the table. If
Nonea block of data is prepared based on the wf_len and dtype arguments.values_units (str | None) – units for the waveform values. If not
Noneand values is an LGDOArray, overrides what’s in values.wf_len (int | None) – The length of the waveforms in each entry of a table. If
None(the default), unequal lengths are assumed andVectorOfVectorsis used for the values column. Ignored if values is a 2D ndarray, in which casevalues.shape[1]is used.dtype (np.dtype) – The NumPy
numpy.dtypeof the waveform data. If values is notNone, this argument is ignored. If both values and dtype areNone,numpy.float64is used.attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO.
- resize_wf_len(new_len)¶
Alias for wf_len.setter, for when we want to make it clear in the code that memory is being reallocated.
- property values: ArrayOfEqualSizedArrays | VectorOfVectors¶