๐ Guidelines๏
Install from PyPI (recommended):
pip install tdaad
Or install from source:
git clone https://github.com/IRT-SystemX/tdaad.git
cd tdaad
pip install .
Requirements:
Python โฅ 3.7
See
requirements.txtfor full dependency list
๐ Quickstart๏
Hereโs a minimal example using TopologicalAnomalyDetector:
import numpy as np
from tdaad.anomaly_detectors import TopologicalAnomalyDetector
# Example multivariate time series with shape (n_samples, n_features)
X = np.random.randn(1000, 3)
# Initialize and fit the detector
detector = TopologicalAnomalyDetector(window_size=100, n_centers_by_dim=3)
detector.fit(X)
# Compute anomaly scores
scores = detector.score_samples(X)
You can also use pandas.DataFrame instead of a NumPy array โ column names will be preserved in the output.
For more advanced usage (e.g. custom embeddings, parameter tuning), see the examples folder or API documentation
๐ Usage Notes๏
TDAAD is designed for multivariate time series (2D inputs) โ univariate data is not supported.
The core detection method relies on sliding-window embeddings and persistent homology to identify structural changes in the signal.
The key parameters that impact results and runtime are:
window_sizecontrols the time resolution โ larger windows capture slower anomalies, smaller ones detect more localized changes.n_centers_by_dimcontrols the number of reference shapes used per homology dimension (e.g. connected components in H0, loops in H1, โฆ). Increasing this improves sensitivity but adds computation time.tda_max_dimsets the maximum topological feature dimension computed (0 = connected components, 1 = loops, 2 = voids, โฆ). Higher values increase runtime and memory usage.
Inputs can be
numpy.ndarrayorpandas.DataFrame. Column names are preserved in the output when using DataFrames.
โ๏ธ You can typically handle ~100 sensors and a few hundred time steps per window on a modern machine.
๐งฎ Basic Complexity of Persistent Homology in TDAAD๏
Total complexity scales with: $
O(N ร (w ร p)^{(d+2)})$ where $w$ is the time resolution (orwindow_size, number of time steps per window), $p$ is the number of variables (features/sensors), $d$ is the maximum homology dimensiontda_max_dim, and $N$ is the total number of sliding windows.So note that increasing max homology dimension
draises the exponent, causing exponential growth. The number of centersn_centers_by_dimused after the PH computation does not significantly affect the overall complexity.