lgdo.types package¶
LEGEND Data Objects (LGDO) types.
Submodules¶
lgdo.types.array module¶
Implements a LEGEND Data Object representing an n-dimensional array and corresponding utilities.
- class lgdo.types.array.Array(*_args, **_kwargs)¶
Bases:
LGDOCollectionHolds an
numpy.ndarrayand attributes.Array(and the other various array types) holds an nda instead of deriving fromnumpy.ndarrayfor the following reasons:It keeps management of the nda totally under the control of the user. The user can point it to another object’s buffer, grab the nda and toss the
Array, etc.It allows the management code to send just the nda’s the central routines for data manpulation. Keeping LGDO’s out of that code allows for more standard, reusable, and (we expect) performant Python.
It allows the first axis of the nda to be treated as “special” for storage in
Tables.
- Parameters:
nda (np.ndarray) – An
numpy.ndarrayto be used for this object’s internal array. Note: the array is used directly, not copied. If not supplied, internal memory is newly allocated based on the shape and dtype arguments.shape (tuple[int, ...]) – A numpy-format shape specification for shape of the internal ndarray. Required if nda is
None, otherwise unused.dtype (np.dtype) – Specifies the type of the data in the array. Required if nda is
None, otherwise unused.fill_val (float | int | None) – If
None, memory is allocated without initialization. Otherwise, the array is allocated with all elements set to the corresponding fill value. If nda is notNone, this parameter is ignored.attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO.
- append(value)¶
Append value(s) to end of array (with copy), resizing array
- property dtype¶
- insert(i, value)¶
Insert value(s) into row i (with copy). If collection of values are provided insert all values. Array will be resized and subsequent values will be shifted.
- property nda¶
- replace(i, value)¶
Replace value at row i
- reserve_capacity(capacity)¶
Set size (number of rows) of internal memory buffer
- resize(new_size, trim=False)¶
Set size of Array in rows. Only change capacity if it must be increased to accommodate new rows; in this case double capacity. If trim is True, capacity will be set to match size. If new_size is an int, do not change size of inner dimensions.
If new_size is a collection, internal memory will be re-allocated, so this should be done only rarely!
- Parameters:
new_size (int | Collection[int]) – number of rows after resize. If provided as a collection, also resize inner dimensions
trim – after resizing, change capacity to match size
- property shape¶
- trim_capacity()¶
Set capacity to be minimum needed to support Array size
- view_as(library, with_units=False)¶
View the Array data as a third-party format data structure.
This is a zero-copy operation. Supported third-party formats are:
pd: returns apandas.Seriesnp: returns the internal nda attribute (numpy.ndarray)ak: returns anak.Arrayinitialized with self.nda
- Parameters:
- Return type:
pd.DataFrame | np.NDArray | ak.Array
See also
lgdo.types.arrayofequalsizedarrays module¶
Implements a LEGEND Data Object representing an array of equal-sized arrays and corresponding utilities.
- class lgdo.types.arrayofequalsizedarrays.ArrayOfEqualSizedArrays(*_args, **_kwargs)¶
Bases:
ArrayAn array of equal-sized arrays.
Arrays of equal size within a file but could be different from application to application. Canonical example: array of same-length waveforms.
- Parameters:
dims (tuple[int, ...] | None) – specifies the dimensions required for building the
ArrayOfEqualSizedArrays’ datatype attribute.nda (np.ndarray) – An
numpy.ndarrayto be used for this object’s internal array. Note: the array is used directly, not copied. If not supplied, internal memory is newly allocated based on the shape and dtype arguments.shape (tuple[int, ...]) – A NumPy-format shape specification for shape of the internal array. Required if nda is
None, otherwise unused.dtype (np.dtype) – Specifies the type of the data in the array. Required if nda is
None, otherwise unused.fill_val (int | float | None) – If
None, memory is allocated without initialization. Otherwise, the array is allocated with all elements set to the corresponding fill value. If nda is notNone, this parameter is ignored.attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO.
Notes
If shape is not “1D array of arrays of shape given by axes 1-N” (of nda) then specify the dimensionality split in the constructor.
See also
- to_vov(cumulative_length=None)¶
Convert (and eventually resize) to
vectorofvectors.VectorOfVectors.- Parameters:
cumulative_length (ndarray) – cumulative length array of the output vector of vectors. Each vector in the output is filled with values found in the
ArrayOfEqualSizedArrays, starting from the first index. ifNone, use all of the original 2D array and make vectors of equal size.- Return type:
lgdo.types.encoded module¶
- class lgdo.types.encoded.ArrayOfEncodedEqualSizedArrays(*_args, **_kwargs)¶
Bases:
LGDOCollectionAn array of encoded arrays with equal decoded size.
Used to represent an encoded
ArrayOfEqualSizedArrays. In addition to an internalVectorOfVectorsself.encoded_data storing the encoded data, the size of the decoded arrays is stored in aScalarself.encoded_size.See also
- Parameters:
encoded_data (VectorOfVectors) – the vector of vectors holding the encoded data.
decoded_size (Scalar | int) – the length of the decoded arrays.
attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO. Should include information about the codec used to encode the data.
- append(value)¶
Append a 1D encoded array at the end.
See also
- insert(i, value)¶
Insert an encoded array at index i.
See also
- replace(i, value)¶
Replace the encoded array at index i with a new one.
See also
- reserve_capacity(*capacity)¶
Reserve capacity (in rows) for later use. Internal memory buffers will have enough entries to store this many rows.
- resize(new_size, trim=False)¶
Resize array along the first axis.
See also
- trim_capacity()¶
set capacity to only what is required to store current contents of LGDOCollection
- view_as(library, with_units=False)¶
View the encoded data as a third-party format data structure.
This is nearly a zero-copy operation.
Supported third-party formats are:
pd: returns apandas.DataFrameak: returns anak.Array(record type)
Note
In the view, decoded_size is expanded into an array.
- Parameters:
- Return type:
pd.DataFrame | np.NDArray | ak.Array
See also
- class lgdo.types.encoded.VectorOfEncodedVectors(*_args, **_kwargs)¶
Bases:
LGDOCollectionAn array of variable-length encoded arrays.
Used to represent an encoded
VectorOfVectors. In addition to an internalVectorOfVectorsself.encoded_data storing the encoded data, a 1DArrayin self.encoded_size holds the original sizes of the encoded vectors.See also
- Parameters:
encoded_data (VectorOfVectors) – the vector of encoded vectors.
decoded_size (Array) – an array holding the original length of each encoded vector in encoded_data.
attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO. Should include information about the codec used to encode the data.
- insert(i, value)¶
Insert an encoded vector at index i.
- Parameters:
See also
- replace(i, value)¶
Replace the encoded vector (and decoded size) at index i with a new one.
- Parameters:
See also
- reserve_capacity(*capacity)¶
Reserve capacity (in rows) for later use. Internal memory buffers will have enough entries to store this many rows.
- resize(new_size)¶
Resize vector along the first axis.
See also
- trim_capacity()¶
set capacity to only what is required to store current contents of LGDOCollection
- view_as(library, with_units=False)¶
View the encoded data as a third-party format data structure.
This is a zero-copy or nearly zero-copy operation.
Supported third-party formats are:
pd: returns apandas.DataFrameak: returns anak.Array(record type)
- Parameters:
- Return type:
pd.DataFrame | np.NDArray | ak.Array
See also
lgdo.types.fixedsizearray module¶
Implements a LEGEND Data Object representing an n-dimensional array of fixed size and corresponding utilities.
- class lgdo.types.fixedsizearray.FixedSizeArray(*_args, **_kwargs)¶
Bases:
ArrayAn array of fixed-size arrays.
Arrays with guaranteed shape along axes > 0: for example, an array of vectors will always length 3 on axis 1, and it will never change from application to application. This data type is used for optimized memory handling on some platforms. We are not that sophisticated so we are just storing this identification for LGDO validity, i.e. for now this class is just an alias for
Array, but keeps track of the datatype name.See also
- view_as(library, with_units=False)¶
View the array as a third-party format data structure.
See also
lgdo.types.histogram module¶
- class lgdo.types.histogram.Histogram(*args, **kwargs)¶
Bases:
StructA special struct to contain histogrammed data.
- Parameters:
weights (hist.Hist | NDArray | Array) – An
numpy.ndarrayto be used for this object’s internal array, or ahist.Histobject, whose data view is used for this object’s internal array. Note: the array/histogram view is used directly, not copiedbinning (None | Iterable[Histogram.Axis] | Iterable[NDArray] | Iterable[tuple[float, float, float]]) –
has to by None if a
hist.Histhas been passed asweightscan be a list of pre-initialized
Histogram.Axiscan be a list of tuples, each representing a range,
(first, last, step)can be a list of numpy arrays, as returned by
numpy.histogramdd().
isdensity (bool) – If True, all bin contents represent a density (amount per volume), and not an absolute amount.
binedge_attrs (dict[str, Any] | None) – attributes that will be added to the all
binedgesof all axes. This does not work ifHistogram.Axisinstances are directly passed as binning.attrs (dict[str, Any] | None) – a set of user attributes to be carried along with this LGDO.
flow (bool) –
If
False, discard counts in over-/underflow bins of the passedhist.Histinstance. IfTrue, this data will also be discarded, but a warning is emitted.Note
Histogramdoes not support storing counts in overflow or underflow bins. This parameter just controls, whether a warning will be emitted.
- class Axis(*args, **kwargs)¶
Bases:
StructA special struct to group axis parameters for use in a
Histogram.Depending on the parameters, an axis either can have
a binning described by a range object, if
first,lastandstepare passed, ora variable binning described by the
edgesarray.
- Parameters:
edges (NDArray | Array | None) – an array of edges that describe the binning of this axis.
first (float | None) – left edge of the leftmost bin
last (float | None) – right edge of the rightmost bin
step (float | None) – step size (width of each bin)
closedleft (bool) – if True, the bin intervals are left-closed \([a,b)\); if False, intervals are right-closed \((a,b]\).
binedge_attrs (dict[str, Any] | None) – attributes that will be added to the
binedgesLGDO that is part of the axis struct.
- property edges: ndarray[tuple[Any, ...], dtype[_ScalarT]]¶
Return all binedges, both for variable and range binning.
- classmethod from_edges(edges, binedge_attrs=None)¶
Create a new axis with variable binning described by
edges.- Return type:
- classmethod from_range_edges(edges, binedge_attrs=None)¶
Create a new axis from the binning described by
edges, but try to convert it to a evenly-spaced range object first.Warning
This function might return a wrong binning, especially in the case of very small magnitudes of the spacing. See the documentation of
numpy.isclose()for details. Use this function only with caution, if you know the binning’s order of magniutude.- Return type:
- get_binedgeattrs(datatype=False)¶
Return a copy of the LGDO attributes dictionary of the binedges
- add_field(name, obj)¶
Error
Not applicable: A histogram cannot be used as a struct
- fill(data, w=None, keys=None)¶
Fill histogram by incrementing bins with data points weighted by w
- Parameters:
data – a ndarray with inner dimension equal to number of axes, or a list of equal-length 1d-arrays containing data for each axis, or a Mapping to 1d-arrays containing data for each axis (requires keys), or a Pandas dataframe (optionally takes a list of keys)
w (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – weight to use for incrementing data points. If None, use 1 for all
keys (Sequence[str]) – list of keys to use if data is a pandas ‘’DataFrame’’ or ‘’Mapping’’
- remove_field(name, delete=False)¶
Error
Not applicable: A histogram cannot be used as a struct
- view_as(library)¶
View the histogram data as a third-party format data structure.
This is typically a zero-copy or nearly zero-copy operation.
Supported third-party formats are:
np: returns a tuple of binning and annp.ndarray, similar to the return value ofnumpy.histogramdd().hist: returns anhist.Histthat holds a copy of this histogram’s data.
Warning
Viewing as
histwill perform a copy of the stored histogram data.- Parameters:
library (str) – format of the returned data view.
- Return type:
See also
lgdo.types.lgdo module¶
- class lgdo.types.lgdo.LGDO(*_args, **_kwargs)¶
Bases:
ABCAbstract base class representing a LEGEND Data Object (LGDO).
- getattrs(datatype=False)¶
Return a copy of the LGDO attributes dictionary.
- abstract view_as(library, with_units=False)¶
View the LGDO data object as a third-party format data structure.
This is typically a zero-copy or nearly zero-copy operation unless explicitly stated in the concrete LGDO documentation. The view can be turned into a copy explicitly by the user with the appropriate methods. If requested by the user, the output format supports it and the LGDO carries a
unitsattribute, physical units are attached to the view through thepintpackage.Typical supported third-party libraries are:
Note
Awkward does not support attaching units through Pint, at the moment.
but the actual supported formats may vary depending on the concrete LGDO class.
- class lgdo.types.lgdo.LGDOCollection(*_args, **_kwargs)¶
Bases:
LGDOAbstract base class representing a LEGEND Collection Object (LGDO). This defines the interface for classes used as table columns.
- append(val)¶
append val to end of LGDOCollection
- clear(trim=False)¶
set size of LGDOCollection to zero
- abstract insert(i, val)¶
insert val into LGDOCollection at position i
- abstract replace(i, val)¶
replace item at position i with val in LGDOCollection
- abstract reserve_capacity(capacity)¶
Reserve capacity (in rows) for later use. Internal memory buffers will have enough entries to store this many rows.
- abstract resize(new_size, trim=False)¶
Return this LGDO’s datatype attribute string.
- abstract trim_capacity()¶
set capacity to only what is required to store current contents of LGDOCollection
lgdo.types.scalar module¶
Implements a LEGEND Data Object representing a scalar and corresponding utilities.
lgdo.types.struct module¶
Implements a LEGEND Data Object representing a struct and corresponding utilities.
- class lgdo.types.struct.Struct(*args, **kwargs)¶
Bases:
LGDO,MutableMappingA dictionary of LGDO’s with an optional set of attributes.
After instantiation, add fields using
add_field()to keep the datatype updated, or callupdate_datatype()after adding.- Parameters:
obj_dict – instantiate this Struct using the supplied named LGDO’s. Note: no copy is performed, the objects are used directly.
attrs – a set of user attributes to be carried along with this LGDO.
- Return type:
- add_field(name, obj)¶
Add a field to the table or set an existing field.
- Parameters:
name (str | int) – key to use for field. Key can be nested (e.g.
name1.name2orname1/name2); this will navigate through the tree, creating new fields as neededobj (LGDO | Mapping[str, LGDO]) – object to add. Can be any LGDO object, or a mapping from names to LGDO objects that will be converted to an LGDO
Struct
- items() a set-like object providing a view on D's items¶
- keys() a set-like object providing a view on D's keys¶
- remove_field(name, delete=False)¶
Remove a field from the table.
- Parameters:
delete (bool) – if
True, delete the field object by callingThe del statement.
- update(other=(), /, **kwargs)¶
Add or set a field(s) to the table or set an existing field. For nested Structs, only update at the lowest level of nesting; unlike for nested dicts, nested fields not included in other will not be removed.
- update_datatype()¶
- values() an object providing a view on D's values¶
- view_as()¶
View the Struct data as a third-party format data structure.
Error
Not implemented. Since Struct’s fields can have different lengths, converting to a NumPy, Pandas or Awkward is generally not possible. Call
LGDO.view_as()on the fields instead.See also
- lgdo.types.struct._is_struct_datatype(dt_name, expr)¶
- lgdo.types.struct._sort_datatype_fields(expr)¶
- lgdo.types.struct._struct_datatype_equal(dt_name, dt1, dt2)¶
lgdo.types.table module¶
Implements a LEGEND Data Object representing a special struct of arrays of equal length and corresponding utilities.
- class lgdo.types.table.Table(*args, **kwargs)¶
Bases:
Struct,LGDOCollectionA special struct of arrays or subtable columns of equal length.
Note
If you write to a table and don’t fill it up to its total size, be sure to resize it before passing to data processing functions, as they will call
__len__()to access valid data, which returns thesizeattribute.- Parameters:
size – sets the number of rows in the table.
Arrays in col_dict will be resized to match size if both are not ``None`. If size is left asNone, the number of table rows is determined from the length of the first array in col_dict. If neither is provided, a default length of 1024 is used.col_dict – instantiate this table using the supplied mapping of column names and array-like objects. Supported input types are: mapping of strings to LGDOCollections,
pd.DataFrameandak.Array. Note 1: no copy is performed, the objects are used directly (unlessak.Arrayis provided). Note 2: if size is notNone, all arrays will be resized to match it. Note 3: if the arrays have different lengths, all will be resized to match the length of the first array.attrs – A set of user attributes to be carried along with this LGDO.
Notes
the
locattribute is initialized to 0.- add_column(name, obj, use_obj_size=False)¶
Alias for
add_field()using table terminology ‘column’.
- add_field(name, obj, use_obj_size=False)¶
Add a field (column) to the table.
Use the name “field” here to match the terminology used in
Struct.- Parameters:
name (str) – key to use for field. Key can be nested (e.g.
name1.name2orname1/name2); this will navigate through the tree, creating new fields as neededobj (LGDOCollection | Mapping[str, LGDOCollection]) – object to add. Can be any
LGDOCollection, or a mapping from names to LGDOCollections that will be converted to an LGDOTable. Size ofobjshould match size of this Tableuse_obj_size (bool) – if
True, resize the table to match the length of obj.
- eval(expr, parameters=None, modules=None, with_units=False, library=None)¶
Apply column operations to the table and return a new LGDO.
Internally uses
numexpr.evaluate()if dealing with columns representable as NumPy arrays oreval()ifVectorOfVectorsare involved. In the latter case, the VoV columns are viewed asak.Arrayand the respective routines are therefore available.To columns nested in subtables can be accessed by scoping with two underscores (
__). For example:tbl.eval("a + tbl2__b")
computes the sum of column a and column b in the subtable tbl2.
- Parameters:
expr (str) – if the expression only involves non-
VectorOfVectorscolumns, the syntax is the one supported bynumexpr.evaluate()(see here for documentation). Note: because of internal limitations, reduction operations must appear the last in the stack. If at least one considered column is aVectorOfVectors, plaineval()is used andak.Arraytransforms can be used through theak.prefix. (NumPy functions are analogously accessible throughnp.). See also examples below.parameters (Mapping[str, str] | None) – a dictionary of function parameters. Passed to
numexpr.evaluate`()as local_dict argument or toeval()as locals argument.modules (Mapping[str, ModuleType] | None) – a dictionary of additional modules used by the expression. If this is not None then
eval`is used and the expression can depend on any modules from this dictionary in addition to awkward and numpy. These are passed to :func:`eval()as globals argument.with_units (bool) – attach units to the columns as in
LGDO.view_as().library (str | None) – library to convert the columns to with
LGDO.view_as(), supported libraries arenp,akorlgdo(pass in directly the unconverted LGDO objects).
- Return type:
Examples
>>> import lgdo >>> tbl = lgdo.Table( ... col_dict={ ... "a": lgdo.Array([1, 2, 3]), ... "b": lgdo.VectorOfVectors([[5], [6, 7], [8, 9, 0]]), ... } ... ) >>> print(tbl.eval("a + b")) [[6], [8 9], [11 12 3], ] >>> print(tbl.eval("np.sum(a) + ak.sum(b)")) 41
- flatten(_prefix='')¶
Flatten the table, if nested.
Returns a new
Table(that references, not copies, the existing columns) with columns in nested tables being moved to the first level (and renamed appropriately).Examples
>>> repr(tbl) "Table(dict={'a': Array([1 2 3], attrs={'datatype': 'array<1>{real}'}), 'tbl': Table(dict={'b': Array([4 5 6], attrs={'datatype': 'array<1>{real}'}), 'tbl1': Table(dict={'z': Array([9 9 9], attrs={'datatype': 'array<1>{real}'})}, attrs={'datatype': 'table{z}'})}, attrs={'datatype': 'table{b,tbl1}'})}, attrs={'datatype': 'table{a,tbl}'})" >>> tbl.flatten().keys() dict_keys(['a', 'tbl__b', 'tbl__tbl1__z'])
- Return type:
- get_dataframe(cols=None, copy=False, prefix='')¶
Get a
pandas.DataFramefrom the data in the table.Warning
This method is deprecated. Use
view_as()to view the table as a Pandas dataframe.Notes
The requested data must be array-like, with the
ndaattribute.- Parameters:
cols (list[str] | None) – a list of column names specifying the subset of the table’s columns to be added to the dataframe.
copy (bool) – When
True, the dataframe allocates new memory and copies data into it. Otherwise, the rawnda’s from the table are used directly.prefix (str) – The prefix to be added to the column names. Used when recursively getting the dataframe of a Table inside this Table
- Return type:
- insert(i, vals)¶
Insert new row(s) into table
- join(other_table, cols=None, keep_mine=False, prefix='', suffix='', do_warn=True)¶
Add the columns of another table to this table.
Notes
Following the join, both tables have access to other_table’s fields (but other_table doesn’t have access to this table’s fields). No memory is allocated in this process. other_table can go out of scope and this table will retain access to the joined data.
- Parameters:
other_table (Table) – the table whose columns are to be joined into this table.
cols (list[str] | None) – a list of names of columns from other_table to be joined into this table.
keep_mine (bool) – if there is a column name conflict, keep this Tables col if
True, or joined Table’s column ifFalse(default).prefix (str) – prepend to joined column names; can be used to avoid name conflicts.
suffix (str) – append to joined column names; can be used to avoid name conflicts.
do_warn (bool) – set to
Falseto turn off warnings associated with mismatched loc parameter oradd_column()warnings.
- remove_column(name, delete=False)¶
Alias for
Struct.remove_field()using table terminology ‘column’.
- replace(i, vals)¶
replace item at position i with val in LGDOCollection
- reserve_capacity(capacity)¶
Set size (number of rows) of internal memory buffer
- resize(new_size=None, do_warn=False, trim=False)¶
Resize all columns of the table
- Parameters:
new_size (int | None) – new size of table. If
Noneuse size of first field founddo_warn (bool) – emit a warning if contents for any field must be resized. This is intended for use with
new_size = Nonetrim (bool) – call
trim_capacity()after resizing to conserve memory
- view_as(library, with_units=False, cols=None, prefix='')¶
View the Table data as a third-party format data structure.
This is typically a zero-copy or nearly zero-copy operation.
Supported third-party formats are:
pd: returns apandas.DataFrameak: returns anak.Array(record type)
Notes
Conversion to Awkward array only works when the key is a string.
- Parameters:
library (str) – format of the returned data view.
with_units (bool) – forward physical units to the output data.
cols (list[str] | None) – a list of column names specifying the subset of the table’s columns to be added to the data view structure.
prefix (str) – The prefix to be added to the column names. Used when recursively getting the dataframe of a
Tableinside thisTable.
- Return type:
pd.DataFrame | np.NDArray | ak.Array
See also
- lgdo.types.table._ak_to_lgdo_or_col_dict(array)¶
lgdo.types.vectorofvectors module¶
Implements a LEGEND Data Object representing a variable-length array of variable-length arrays and corresponding utilities.
- class lgdo.types.vectorofvectors.VectorOfVectors(*_args, **_kwargs)¶
Bases:
LGDOCollectionA n-dimensional variable-length 1D array of variable-length 1D arrays.
If the vector is 2-dimensional, the internal representation is as two NumPy arrays, one to store the flattened data contiguosly (
flattened_data) and one to store the cumulative sum of lengths of each vector (cumulative_length). When the dimension is more than 2,flattened_datais aVectorOfVectorsitself.Examples
>>> from lgdo import VectorOfVectors >>> data = VectorOfVectors( ... [[[1, 2], [3, 4, 5]], [[2], [4, 8, 9, 7]], [[5, 3, 1]]], ... attrs={"units": "m"} ... ) >>> print(data) [[[1, 2], [3, 4, 5]], [[2], [4, 8, 9, 7]], [[5, 3, 1]] ] with attrs={'units': 'm'} >>> data.view_as("ak") <Array [[[1, 2], [3, 4, 5]], ..., [[5, ..., 1]]] type='3 * var * var * int64'>
Note
Many class methods are currently implemented only for 2D vectors and will raise an exception on higher dimensional data.
- Parameters:
data (ArrayLike | None) – Any array-like structure accepted by the
ak.Arrayconstructor, with the exception that elements cannot be of typeOptionType,UnionTypeorRecordType. Takes priority over flattened_data and cumulative_length. The serialization of theak.Arrayis performed throughak.to_buffers(). Since the latter returns non-data-owning NumPy arrays, which would prevent later modifications like resizing, a copy is performed.flattened_data (ArrayLike | None) – if not
None, used as the internal array for self.flattened_data. Otherwise, an internal flattened_data is allocated based on cumulative_length (or shape_guess) and dtype.cumulative_length (ArrayLike | VectorOfVectors | None) – if not
None, used as the internal array for self.cumulative_length. Should be dtypenumpy.uint32. If cumulative_length isNone, an internal cumulative_length is allocated based on the first element of shape_guess.shape_guess (Sequence[int, ...] | None) – a NumPy-format shape specification, required if either of flattened_data or cumulative_length are not supplied. The first element should not be a guess and sets the number of vectors to be stored. The second element is a guess or approximation of the typical length of a stored vector, used to set the initial length of flattened_data if it was not supplied.
dtype (DTypeLike | None) – sets the type of data stored in flattened_data. Required if flattened_data and array are
None.fill_val (int | float | None) – fill all of self.flattened_data with this value.
attrs (Mapping[str, Any] | None) – a set of user attributes to be carried along with this LGDO.
- static _ak_is_jagged(type_)¶
Returns
Trueifak.Arrayis jagged at all axes.This assures that
ak.to_buffers()returns the expected data structures.- Return type:
- static _ak_is_valid(type_)¶
Returns
Trueifak.Arraycontains only elements we can serialize to LH5.- Return type:
- _set_vector_unsafe(i, vec, lens=None)¶
Insert vector vec at position i.
Assumes that
j = self.cumulative_length[i-1]is the index (in self.flattened_data) of the end of the (i-1)th vector and copies vec inself.flattened_data[j:sum(lens)]. Finally updatesself.cumulative_length[i]with the new flattened data array length.Vectors stored after index i can be overridden, producing unintended behavior. This method is typically used for fast sequential fill of a pre-allocated vector of vectors.
If i`vec` is 1D array and lens is
None, set using full array. If vec is 2D, require lens to be included, and fill each array only up to lengths in lens.Danger
This method can lead to undefined behavior or vector invalidation if used improperly. Use it only if you know what you are doing.
- append(new)¶
Append a 1D vector new at the end.
Examples
>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]]) >>> vov.append([8, 9]) >>> print(vov) [[1 2 3], [4 5], [8 9], ]
- get_capacity()¶
Get tuple containing capacity of each dimension. First dimension is cumulative length array. Last dimension is flattened data.
- insert(i, new)¶
Insert a vector at index i.
self.flattened_data (and therefore self.cumulative_length) is resized in order to accommodate the new element.
Examples
>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]]) >>> vov.insert(1, [8, 9]) >>> print(vov) [[1 2 3], [8 9], [4 5], ]
- property ndim¶
- replace(i, new)¶
Replace the vector at index i with new.
self.flattened_data (and therefore self.cumulative_length) is resized, if the length of new is different from the vector currently at index i.
Examples
>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]]) >>> vov.replace(0, [8, 9]) >>> print(vov) [[8 9], [4 5], ]
- reserve_capacity(cap_cl, *cap_args)¶
Set capacity of internal data arrays. Expect number of args to equal self.n_dim. First arg is capacity of cumulative length array. If self.n_dim is 2, second argument is capacity of flattened data, otherwise arguments are fed recursively to remaining dimensions.
- resize(new_size, trim=False)¶
Resize vector along the first axis.
self.flattened_data is resized only if new_size is smaller than the current vector length.
If new_size is larger than the current vector length, self.cumulative_length is padded with its last element. This corresponds to appending empty vectors.
If trim is
True, resize capacity to match new sizeExamples
>>> vov = VectorOfVectors([[1, 2, 3], [4, 5]]) >>> vov.resize(3) >>> print(vov) [[1 2 3], [4 5], [], ]
>>> vov = VectorOfVectors([[1, 2], [3], [4, 5]]) >>> vov.resize(2) >>> print(vov) [[1 2], [3], ]
- to_aoesa(max_len=None, fill_val=nan, preserve_dtype=False)¶
Convert to
ArrayOfEqualSizedArrays.Note
The dtype of the original vector is typically not strictly preserved. The output dtype will be either
np.float64ornp.int64. If you want to use the same exact dtype, set preserve_dtype toTrue.- Parameters:
max_len (int | None) – the length of the returned array along its second dimension. Longer vectors will be truncated, shorter will be padded with fill_val. If
None, the length will be equal to the length of the longest vector.fill_val (bool | int | float) – value used to pad shorter vectors up to max_len. The dtype of the output array will be such that both fill_val and the vector values can be represented in the same data structure.
preserve_dtype (bool) – whether the output array should have exactly the same dtype as the original vector of vectors. The type fill_val must be a compatible one.
- Return type:
- trim_capacity()¶
Set capacity for all dimensions to minimum needed to hold data
- view_as(library, with_units=False, fill_val=nan, preserve_dtype=False)¶
View the vector data as a third-party format data structure.
This is typically a zero-copy or nearly zero-copy operation.
Supported third-party formats are:
pd: returns apandas.Series(supported through theawkward-pandaspackage)np: returns anumpy.ndarray, padded with zeros to make it rectangular. This implies memory re-allocation.ak: returns anak.Array.self.cumulative_lengthis currently re-allocated for technical reasons.
Notes
Awkward array views partially involve memory re-allocation (the cumulative_lengths), while NumPy “exploded” views clearly imply a full copy.
- Parameters:
library (str) – format of the returned data view.
with_units (bool) – forward physical units to the output data.
fill_val (bool | int | float) – forwarded to
to_aoesa(), if library isnp.preserve_dtype (bool) – forwarded to
to_aoesa(), if library isnp.
- Return type:
pd.DataFrame | np.NDArray | ak.Array
See also
- @numba.jit lgdo.types.vectorofvectors._to_aoesa(flattened_array, cumulative_length, nda)¶
lgdo.types.vovutils module¶
VectorOfVectors utilities.
Note: importing this module takes a long time, so it should be lazily
imported inside of a function call rather than with a full module
- @numba.jit lgdo.types.vovutils._nb_build_cl(sorted_array_in, cumulative_length_out)¶
numbified inner loop for build_cl
- @numba.jit lgdo.types.vovutils._nb_explode(cumulative_length, array_in, array_out)¶
Numbified inner loop for
explode().
- @numba.jit lgdo.types.vovutils._nb_explode_cl(cumulative_length, array_out)¶
numbified inner loop for explode_cl
- @numba.guvectorize lgdo.types.vovutils._nb_fill(aoa_in, len_in, nan_val, flattened_array_out)¶
Options:
boundscheck=False,cache=TruePrecompiled signatures:
?i??->,?l??->,?I??->,?L??->,bibb->,blbb->,bIbb->,bLbb->,hihh->,hlhh->,hIhh->,hLhh->,iiii->,ilii->,iIii->,iLii->,lill->,llll->,lIll->,lLll->,BiBB->,BlBB->,BIBB->,BLBB->,HiHH->,HlHH->,HIHH->,HLHH->,IiII->,IlII->,IIII->,ILII->,LiLL->,LlLL->,LILL->,LLLL->,fiff->,flff->,fIff->,fLff->,didd->,dldd->,dIdd->,dLdd->,FiFF->,FlFF->,FIFF->,FLFF->,DiDD->,DlDD->,DIDD->,DLDD->
Vectorized function to fill flattened array from array of arrays and lengths. Values in aoa_in past lengths will not be copied.
- Parameters:
aoa_in (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – array of arrays containing values to be copied
len_in (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – array of vector lengths for each row of aoa_in
nan_val (int | float) – value to use when len_in is longer than aoa_in. Should use np.nan for floating point, and 0xfff… for integer types
flattened_array_out (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – flattened array to copy values into. Must be longer than sum of lengths in len_in
- lgdo.types.vovutils.build_cl(sorted_array_in, cumulative_length_out=None)¶
Build a cumulative length array from an array of sorted data.
Examples
>>> build_cl(np.array([3, 3, 3, 4]) array([3., 4.])
For a sorted_array_in of indices, this is the inverse of
explode_cl(), in the sense that doingbuild_cl(explode_cl(cumulative_length))would recover the original cumulative_length.- Parameters:
sorted_array_in (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – array of data already sorted; each N matching contiguous entries will be converted into a new row of cumulative_length_out.
cumulative_length_out (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None) – a pre-allocated array for the output cumulative_length. It will always have length <= sorted_array_in, so giving them the same length is safe if there is not a better guess.
- Returns:
cumulative_length_out – the output cumulative length array. If the user provides a cumulative_length_out that is too long, this return value is sliced to contain only the used portion of the allocated memory.
- Return type:
- lgdo.types.vovutils.explode(cumulative_length, array_in, array_out=None)¶
Explode a data array using a cumulative_length array.
This is identical to
explode_cl(), except array_in gets exploded instead of cumulative_length.Examples
>>> explode(np.array([2, 3]), np.array([3, 4])) array([3., 3., 4.])
- Parameters:
cumulative_length (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – the cumulative length array to use for exploding.
array_in (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – the data to be exploded. Must have same length as cumulative_length.
array_out (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None) – a pre-allocated array to hold the exploded data. The length should be equal to
cumulative_length[-1].
- Returns:
array_out – the exploded cumulative length array.
- Return type:
- lgdo.types.vovutils.explode_arrays(cumulative_length, arrays, arrays_out=None)¶
Explode a set of arrays using a cumulative_length array.
- Parameters:
cumulative_length (Array) – the cumulative length array to use for exploding.
arrays (Sequence[ndarray[tuple[Any, ...], dtype[_ScalarT]]]) – the data arrays to be exploded. Each array must have same length as cumulative_length.
arrays_out (Sequence[ndarray[tuple[Any, ...], dtype[_ScalarT]]] | None) – a list of pre-allocated arrays to hold the exploded data. The length of the list should be equal to the length of arrays, and each entry in arrays_out should have length
cumulative_length[-1]. If not provided, output arrays are allocated for the user.
- Returns:
arrays_out – the list of exploded cumulative length arrays.
- Return type:
- lgdo.types.vovutils.explode_cl(cumulative_length, array_out=None)¶
Explode a cumulative_length array.
Examples
>>> explode_cl(np.array([2, 3])) array([0., 0., 1.])
This is the inverse of
build_cl(), in the sense that doingbuild_cl(explode_cl(cumulative_length))would recover the original cumulative_length.- Parameters:
- Returns:
array_out – the exploded cumulative length array.
- Return type:
lgdo.types.waveformtable module¶
Implements a LEGEND Data Object representing a special
Table to store blocks of one-dimensional time-series
data.
- class lgdo.types.waveformtable.WaveformTable(*args, **kwargs)¶
Bases:
TableAn LGDO for storing blocks of (1D) time-series data.
A
WaveformTableis an LGDOTablewith the 3 columnst0,dt, andvalues:t0[i]is a time offset (relative to a user-defined global reference) for the sample invalues[i][0]. Implemented as an LGDOArraywith optional attributeunits.dt[i]is the sampling period for the waveform atvalues[i]. Implemented as an LGDOArraywith optional attributeunits.values[i]is thei’th waveform in the table. Internally, the waveforms values may be either an LGDOArrayOfEqualSizedArrays<1,1>, an LGDOVectorOfVectorsorVectorOfEncodedVectorsthat supports waveforms of unequal length. Can optionally be given aunitsattribute.
Note
On-disk and in-memory versions could be different e.g. if a compression routine is used.
- Parameters:
size (int | None) – sets the number of rows in the table. If
None, the size will be determined from the first among t0, dt, or values to return a valid length. If notNone, t0, dt, and values will be resized as necessary to match size. If size isNoneand t0, dt, and values are all non-array-like, a default size of 1024 is used.t0 (float | Array | np.ndarray) – \(t_0\) values to be used (or broadcast) to the t0 column.
t0_units (str | None) – units for the \(t_0\) values. If not
Noneand t0 is an LGDOArray, overrides what’s in t0.dt (float | Array | np.ndarray) – \(\delta t\) values (sampling period) to be used (or broadcasted) to the t0 column.
dt_units (str | None) – units for the dt values. If not
Noneand dt is an LGDOArray, overrides what’s in dt.values (ArrayOfEqualSizedArrays | VectorOfVectors | np.ndarray) – The waveform data to be stored in the table. If
Nonea block of data is prepared based on the wf_len and dtype arguments.values_units (str | None) – units for the waveform values. If not
Noneand values is an LGDOArray, overrides what’s in values.wf_len (int | None) – The length of the waveforms in each entry of a table. If
None(the default), unequal lengths are assumed andVectorOfVectorsis used for the values column. Ignored if values is a 2D ndarray, in which casevalues.shape[1]is used.dtype (np.dtype) – The NumPy
numpy.dtypeof the waveform data. If values is notNone, this argument is ignored. If both values and dtype areNone,numpy.float64is used.attrs (dict[str, Any] | None) – A set of user attributes to be carried along with this LGDO.
- resize_wf_len(new_len)¶
Alias for wf_len.setter, for when we want to make it clear in the code that memory is being reallocated.
- property values: ArrayOfEqualSizedArrays | VectorOfVectors¶