Time series downsampling in rust
Extremely fast time series downsampling 📈 for visualization, written in Rust.
CPython has the infamous Global Interpreter Lock, which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for CPU-bound tasks and often forces developers to accept the overhead of multiprocessing.In Rust - which is a compiled language - there is no GIL, so CPU-bound tasks can be parallelized (with Rayon) with little to no overhead.
x: f32, f64, i16, i32, i64, u16, u32, u64, datetime64, timedelta64y: f16, f32, f64, i8, i16, i32, i64, u8, u16, u32, u64, datetime64, timedelta64, boolf16 argminmax is 200-300x faster than numpyf16 is *not* hardware supported (i.e., no instructions for f16) by most modern CPUs!! f16 to i16 is sufficient. This mapping allows to use the hardware supported scalar and SIMD i16 instructions - while not producing any memory overhead 🎉 pip install tsdownsample
from tsdownsample import MinMaxLTTBDownsampler
import numpy as np
# Create a time series
y = np.random.randn(10_000_000)
x = np.arange(len(y))
# Downsample to 1000 points (assuming constant sampling rate)
s_ds = MinMaxLTTBDownsampler().downsample(y, n_out=1000)
# Select downsampled data
downsampled_y = y[s_ds]
# Downsample to 1000 points using the (possible irregularly spaced) x-data
s_ds = MinMaxLTTBDownsampler().downsample(x, y, n_out=1000)
# Select downsampled data
downsampled_x = x[s_ds]
downsampled_y = y[s_ds]
Each downsampling algorithm is implemented as a class that implements a downsample method.
The signature of the downsample method:
downsample([x], y, n_out, **kwargs) -> ndarray[uint64]
Arguments:
x is optionalx and y are both positional argumentsn_out is a mandatory keyword argument that defines the number of output values***kwargs are optional keyword arguments (see table below):
parallel: whether to use multi-threading (default: False)**Returns: a ndarray[uint64] of indices that can be used to index the original data.
*When there are gaps in the time series, fewer than n_out indices may be returned.
**parallel is not supported for LTTBDownsampler.
The following downsampling algorithms (classes) are implemented:
| Downsampler | Description | **kwargs |
|---|---|---|
MinMaxDownsampler |
selects the min and max value in each bin | parallel |
M4Downsampler |
selects the min, max, first and last value in each bin | parallel |
LTTBDownsampler |
performs the Largest Triangle Three Buckets algorithm | |
MinMaxLTTBDownsampler |
(new two-step algorithm 🎉) first selects n_out * minmax_ratio min and max values, then further reduces these to n_out values using the Largest Triangle Three Buckets algorithm |
parallel, minmax_ratio* |
*Default value for minmax_ratio is 30, which is empirically proven to be a good default. (More details in our upcomming paper)
Assumes;
x-data is (non-strictly) monotonic increasing (i.e., sorted)NaNs in the data👤 Jeroen Van Der Donckt