lgdo.compression package¶

Data compression utilities.

This subpackage collects all LEGEND custom data compression (encoding) and decompression (decoding) algorithms.

Available lossless waveform compression algorithms:

RadwareSigcompress, a Python port of the C algorithm radware-sigcompress by D. Radford.
ULEB128ZigZagDiff variable-length base-128 encoding of waveform differences.

All waveform compression algorithms inherit from the WaveformCodec abstract class.

encode() and decode() provide a high-level interface for encoding/decoding LGDOs.

>>> from lgdo import WaveformTable, compression
>>> wftbl = WaveformTable(...)
>>> enc_wft = compression.encode(wftable, RadwareSigcompress(codec_shift=-23768)
>>> compression.decode(enc_wft) # == wftbl

Submodules¶

lgdo.compression.base module¶

class lgdo.compression.base.WaveformCodec¶

Bases: object

Base class identifying a waveform compression algorithm.

The self.codec property returns a string identifier suitable for labeling encoded data on disk. This identifier is constant for all class instances.

Note

This is an abstract type. The user must provided a concrete subclass.

asdict()¶: Return the dataclass fields as dictionary.

property codec¶

The waveform codec string identifier.

Will be attached as an attribute to the encoded Waveform values.

lgdo.compression.generic module¶

lgdo.compression.generic._is_codec(ident, codec)¶

Return type:: bool

lgdo.compression.generic.decode(obj, out_buf=None)¶

Decode encoded LGDOs.

Defines decoding behaviors for each implemented waveform encoding algorithm. Expects to find the codec (and its parameters) the arrays where encoded with among the LGDO attributes.

Parameters:

obj (lgdo.VectorOfEncodedVectors | lgdo.ArrayOfEncodedEqualSizedArrays) – LGDO array type.
out_buf (lgdo.ArrayOfEqualSizedArrays) – pre-allocated LGDO for the decoded signals. See documentation of wrapped encoders for limitations.

Return type:

lgdo.VectorOfVectors | lgdo.ArrayOfEqualsizedArrays

lgdo.compression.generic.encode(obj, codec=None)¶

Encode LGDOs with codec.

Defines behaviors for each implemented waveform encoding algorithm.

Parameters:

obj (lgdo.VectorOfVectors | lgdo.ArrayOfEqualsizedArrays) – LGDO array type.
codec (WaveformCodec | str) – algorithm to be used for encoding.

Return type:

lgdo.VectorOfEncodedVectors | lgdo.ArrayOfEncodedEqualSizedArrays

lgdo.compression.radware module¶

class lgdo.compression.radware.RadwareSigcompress(codec_shift=0)¶

Bases: WaveformCodec

radware-sigcompress array codec.

Examples

>>> from lgdo.compression import RadwareSigcompress
>>> codec = RadwareSigcompress(codec_shift=-32768)

codec_shift: int = 0¶

Offset added to the input waveform before encoding.

The radware-sigcompress algorithm is limited to encoding of 16-bit integer values. In certain cases (notably, with unsigned 16-bit integer values), shifting incompatible data by a fixed amount circumvents the issue.

@numba.jit lgdo.compression.radware._get_high_u16(x)¶

Return type:: uint16

@numba.jit lgdo.compression.radware._get_hton_u16(a, i)¶

Read unsigned 16-bit integer values from an array of unsigned 8-bit integers.

The first two most significant bytes of the values must be stored contiguously in a with big-endian order.

Return type:: uint16

@numba.jit lgdo.compression.radware._get_low_u16(x)¶

Return type:: uint16

@numba.guvectorize lgdo.compression.radware._radware_sigcompress_decode(sig_in, sig_out, shift, siglen, _mask=array([0, 1, 3, ..., 16383, 32767, 65535], shape=(17,), dtype=uint16))¶

Options: boundscheck=False, nopython=True, cache=True
Precompiled signatures: BHiIH->, BIiIH->, BLiIH->, BhiIH->, BiiIH->, BliIH->

Deompress a digital signal.

After decoding, the signal values are shifted by -shift to restore the original waveform. The dtype of sig_out must be large enough to contain it.

Almost literal translations of decompress_signal() from the radware-sigcompress v1.0 C-code by David Radford [1]. See _radware_sigcompress_encode() for a list of changes to the original algorithm.

Parameters:

sig_in (ndarray[tuple[int, ...], dtype[uint8]]) – array holding the input, compressed signal. In the original code, an array of 16-bit unsigned integers was expected.
sig_out (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – pre-allocated array for the decompressed signal. In the original code, an array of 16-bit integers was expected.
shift (int32) – the value the original signal(s) was shifted before compression. The value is subtracted from samples in sig_out right after decoding.

Returns:

length – length of output, decompressed signal.

Return type:

None

@numba.guvectorize lgdo.compression.radware._radware_sigcompress_encode(sig_in, sig_out, shift, siglen, _mask=array([0, 1, 3, ..., 16383, 32767, 65535], shape=(17,), dtype=uint16))¶

Options: boundscheck=False, nopython=True, cache=True
Precompiled signatures: HBiIH->, IBiIH->, LBiIH->, hBiIH->, iBiIH->, lBiIH->

Compress a digital signal.

Shifts the signal values by +shift and internally interprets the result as numpy.int16. Shifted signals must be therefore representable as numpy.int16, for lossless compression.

Note

The algorithm also computes the first derivative of the input signal, which cannot always be represented as a 16-bit integer. In such cases, overflows occur, but they seem to be innocuous.

Almost literal translations of compress_signal() from the radware-sigcompress v1.0 C-code by David Radford [1]. Summary of changes:

Shift the input signal by shift before encoding.
Store encoded, numpy.uint16 signal as an array of bytes (numpy.ubyte), in big-endian ordering.
Declare mask globally to avoid extra memory allocation.
Enable hardware-vectorization with Numba (numba.guvectorize()).
Add a couple of missing array boundary checks.

Parameters:

sig_in (ndarray[tuple[int, ...], dtype[_ScalarType_co]]) – array of integers holding the input signal. In the original C code, an array of 16-bit integers was expected.
sig_out (ndarray[tuple[int, ...], dtype[uint8]]) – pre-allocated array for the unsigned 8-bit encoded signal. In the original C code, an array of unsigned 16-bit integers was expected.
shift (int32) – value to be added to sig_in before compression.
siglen (uint32) – array that will hold the lengths of the compressed signals.

Returns:

length – number of bytes in the encoded signal

Return type:

None

@numba.jit lgdo.compression.radware._set_high_u16(x, y)¶

Return type:: uint32

@numba.jit lgdo.compression.radware._set_hton_u16(a, i, x)¶

Store an unsigned 16-bit integer value in an array of unsigned 8-bit integers.

The first two most significant bytes from x are stored contiguously in a with big-endian order.

Return type:: int

@numba.jit lgdo.compression.radware._set_low_u16(x, y)¶

Return type:: uint32

lgdo.compression.radware.decode(sig_in, sig_out=None, shift=0)¶

Decompress digital signal(s) with radware-sigcompress.

Wraps _radware_sigcompress_decode() and adds support for decoding LGDOs. Resizes the decoded signals to their actual length.

Note

If sig_in is a NumPy array, no resizing (along the last dimension) of sig_out to its actual length is performed. Not even of the internally allocated one. If a pre-allocated ArrayOfEqualSizedArrays is provided, it won’t be resized too. The internally allocated ArrayOfEqualSizedArrays sig_out has instead always the correct size.

Because of the current (hardware vectorized) implementation, providing a pre-allocated VectorOfVectors as sig_out is not possible.

Parameters:

sig_in (NDArray[ubyte] | lgdo.VectorOfEncodedVectors | lgdo.ArrayOfEncodedEqualSizedArrays) – array(s) holding the input, compressed signal(s). Output of encode().
sig_out (NDArray | lgdo.ArrayOfEqualSizedArrays) – pre-allocated array(s) for the decompressed signal(s). If not provided, will allocate a 32-bit integer array(s) structure.
shift (int32) – the value the original signal(s) was shifted before compression. The value is subtracted from samples in sig_out right after decoding.

Returns:

sig_out, nbytes | LGDO – given pre-allocated structure or new structure of 32-bit integers, plus the number of bytes (length) of the decoded signal.

Return type:

(NDArray, NDArray[uint32]) | lgdo.VectorOfVectors | lgdo.ArrayOfEqualSizedArrays

See also

_radware_sigcompress_decode

lgdo.compression.radware.encode(sig_in, sig_out=None, shift=0)¶

Compress digital signal(s) with radware-sigcompress.

Wraps _radware_sigcompress_encode() and adds support for encoding LGDO arrays. Resizes the encoded array to its actual length.

Note

If sig_in is a NumPy array, no resizing of sig_out is performed. Not even of the internally allocated one.

Because of the current (hardware vectorized) implementation, providing a pre-allocated VectorOfEncodedVectors or ArrayOfEncodedEqualSizedArrays as sig_out is not possible.

Note

The compression algorithm internally interprets the input waveform values as 16-bit integers. Make sure that your signal can be safely cast to such a numeric type. If not, you may want to apply a shift to the waveform.

Parameters:

sig_in (NDArray | lgdo.VectorOfVectors | lgdo.ArrayOfEqualSizedArrays) – array(s) holding the input signal(s).
sig_out (NDArray[ubyte]) – pre-allocated unsigned 8-bit integer array(s) for the compressed signal(s). If not provided, a new one will be allocated.
shift (int32) – value to be added to sig_in before compression.

Returns:

sig_out, nbytes | LGDO – given pre-allocated sig_out structure or new structure of unsigned 8-bit integers, plus the number of bytes (length) of the encoded signal. If sig_in is an LGDO, only a newly allocated VectorOfEncodedVectors or ArrayOfEncodedEqualSizedArrays is returned.

Return type:

(NDArray[ubyte], NDArray[uint32]) | lgdo.VectorOfEncodedVectors | lgdo.ArrayOfEncodedEqualSizedArrays

See also

_radware_sigcompress_encode

lgdo.compression.utils module¶

lgdo.compression.utils.str2wfcodec(expr)¶

Eval strings containing WaveformCodec declarations.

Simple tool to avoid using eval(). Used to read WaveformCodec declarations configured in JSON files.

Return type:: WaveformCodec

lgdo.compression.varlen module¶

Variable-length code compression algorithms.

class lgdo.compression.varlen.ULEB128ZigZagDiff(codec='uleb128_zigzag_diff')¶

Bases: WaveformCodec

ZigZag [2] encoding followed by Unsigned Little Endian Base 128 (ULEB128) [3] encoding of array differences.

codec: str = 'uleb128_zigzag_diff'¶

lgdo.compression.varlen.decode(sig_in, sig_out=None)¶

Deompress digital signal(s) with a variable-length encoding of its derivative.

Wraps uleb128_zigzag_diff_array_decode() and adds support for decoding LGDOs.

Note

Because of the current (hardware vectorized) implementation, providing a pre-allocated VectorOfVectors as sig_out is not possible.

Parameters:

sig_in ((NDArray[ubyte], NDArray[uint32]) | lgdo.VectorOfEncodedVectors | lgdo.ArrayOfEncodedEqualSizedArrays) – array(s) holding the input, compressed signal(s). Output of encode().
sig_out (NDArray | lgdo.ArrayOfEqualSizedArrays) – pre-allocated array(s) for the decompressed signal(s). If not provided, will allocate a 32-bit integer array(s) structure.

Returns:

sig_out, nbytes | LGDO – given pre-allocated structure or new structure of 32-bit integers, plus the number of bytes (length) of the decoded signal.

Return type:

(NDArray, NDArray[uint32]) | lgdo.VectorOfVectors | lgdo.ArrayOfEqualSizedArrays

See also

uleb128_zigzag_diff_array_decode

lgdo.compression.varlen.encode(sig_in, sig_out=None)¶

Compress digital signal(s) with a variable-length encoding of its derivative.

Wraps uleb128_zigzag_diff_array_encode() and adds support for encoding LGDOs.

Note

If sig_in is a NumPy array, no resizing of sig_out is performed. Not even of the internally allocated one.

Because of the current (hardware vectorized) implementation, providing a pre-allocated VectorOfEncodedVectors or ArrayOfEncodedEqualSizedArrays as sig_out is not possible.

Parameters:

sig_in (NDArray | lgdo.VectorOfVectors | lgdo.ArrayOfEqualSizedArrays) – array(s) holding the input signal(s).
sig_out (NDArray[ubyte]) – pre-allocated unsigned 8-bit integer array(s) for the compressed signal(s). If not provided, a new one will be allocated.

Returns:

Return type:

(NDArray[ubyte], NDArray[uint32]) | lgdo.VectorOfEncodedVectors | lgdo.ArrayOfEncodedEqualSizedArrays

See also

uleb128_zigzag_diff_array_encode

@numba.jit lgdo.compression.varlen.uleb128_decode(encx)¶

Decode a variable-length integer into an unsigned integer.

Implements the Unsigned Little Endian Base-128 decoding [3]. Only encoded positive numbers are expected, as no two’s complement is applied.

Parameters:: encx (NDArray[ubyte]) – the encoded varint as a NumPy array of bytes.
Returns:: x, nread – the decoded value and the number of bytes read from the input array.
Return type:: (int, int)

@numba.jit lgdo.compression.varlen.uleb128_encode(x, encx)¶

Compute a variable-length representation of an unsigned integer.

Implements the Unsigned Little Endian Base-128 encoding [3]. Only positive numbers are expected, as no two’s complement is applied.

Parameters:

x (int) – the number to be encoded.
encx (ndarray[tuple[int, ...], dtype[uint8]]) – the encoded varint as a NumPy array of bytes.

Returns:

nbytes – size of varint in bytes

Return type:

int

@numba.guvectorize lgdo.compression.varlen.uleb128_zigzag_diff_array_decode(sig_in, nbytes, sig_out, siglen)¶

Options: boundscheck=False, nopython=True, cache=True
Precompiled signatures: BIHI->, BIII->, BILI->, BIhI->, BIiI->, BIlI->

Decode an array of variable-length integers.

The algorithm inverts uleb128_zigzag_diff_array_encode() by decoding the variable-length binary data in sig_in with uleb128_decode(), then reconstructing the original signal derivative with zigzag_decode() and finally computing its cumulative (i.e. the original signal).

Parameters:

sig_in (ndarray[tuple[int, ...], dtype[uint8]]) – the array of bytes encoding the variable-length integers.
nbytes (int) – the number of bytes to read from sig_in (stored in the first index of this array).
sig_out (ndarray[tuple[int, ...], dtype[int]]) – pre-allocated array for the output decoded signal.
siglen (int) – the length of the decoded signal, (stored in the first index of this array).

See also

uleb128_zigzag_diff_array_encode

@numba.guvectorize lgdo.compression.varlen.uleb128_zigzag_diff_array_encode(sig_in, sig_out, nbytes)¶

Options: boundscheck=False, nopython=True, cache=True
Precompiled signatures: HBI->, IBI->, LBI->, hBI->, iBI->, lBI->

Encode an array of integer numbers.

The algorithm computes the derivative (prepending 0 first) of sig_in, maps it to positive numbers by applying zigzag_encode() and finally computes its variable-length binary representation with uleb128_encode().

The encoded data is stored in sig_out as an array of bytes. The number of bytes written is stored in nbytes. The actual encoded data can therefore be found in sig_out[:nbytes].

Parameters:

sig_in (ndarray[tuple[int, ...], dtype[int]]) – the input array of integers.
sig_out (ndarray[tuple[int, ...], dtype[uint8]]) – pre-allocated bytes array for the output encoded data.
nbytes (int) – pre-allocated output array holding the number of bytes written (stored in the first index).

See also

uleb128_zigzag_diff_array_decode

@numba.vectorize lgdo.compression.varlen.zigzag_decode(x)¶

Precompiled signatures: L->l, I->i, H->h

ZigZag-decode [2] signed integer numbers.

Return type:: int | ndarray[tuple[int, …], dtype[int]]

@numba.vectorize lgdo.compression.varlen.zigzag_encode(x)¶

Precompiled signatures: l->L, i->I, h->H

ZigZag-encode [2] signed integer numbers.

Return type:: int | ndarray[tuple[int, …], dtype[int]]