tdaad package

Submodules

tdaad.anomaly_detectors module

Topological Anomaly Detectors.

class tdaad.anomaly_detectors.TopologicalAnomalyDetector(window_size: int = 100, step: int = 5, tda_max_dim: int = 1, n_centers_by_dim: int = 5, support_fraction: float | None = None, contamination: float = 0.1, random_state: int | RandomState | None = 42)[source]

Bases: EllipticEnvelope, TransformerMixin

Anomaly detection for multivariate time series using topological embeddings and robust covariance estimation.

This detector extracts topological features from sliding windows of time series data and uses a robust Mahalanobis distance (via PandasEllipticEnvelope) to score anomalies.

Read more in the User Guide.

Parameters:
  • window_size (int, default=100) – Sliding window size for extracting time series subsequences.

  • step (int, default=5) – Step size between windows.

  • tda_max_dim (int, default=1) – Maximum homology dimension used for topological feature extraction.

  • n_centers_by_dim (int, default=5) – Number of k-means centers per topological dimension (for vectorization).

  • support_fraction (float or None, default=None) – Proportion of data to use for robust covariance estimation. If None, computed automatically.

  • contamination (float, default=0.1) – Proportion of anomalies in the data, used to compute decision threshold.

  • random_state (int, RandomState instance, or None, default=42) – Controls randomness of the topological embedding and robust estimator.

topological_embedding_

TopologicalEmbedding transformer object that is fitted at fit.

Type:

object

Examples

>>> n_timestamps = 1000
>>> n_sensors = 20
>>> import pandas as pd
>>> timestamps = pd.to_datetime('2024-01-01', utc=True) + pd.Timedelta(1, 'h') * np.arange(n_timestamps)
>>> X = pd.DataFrame(np.random.random(size=(n_timestamps, n_sensors)), index=timestamps)
>>> X.iloc[n_timestamps//2:,:10] = -X.iloc[n_timestamps//2:,10:20]
>>> detector = TopologicalAnomalyDetector(n_centers_by_dim=2, tda_max_dim=1).fit(X)
>>> anomaly_scores = detector.score_samples(X)
>>> decision = detector.decision_function(X)
>>> anomalies = detector.predict(X)
decision_function(X)[source]

Return the distance to the decision boundary.

fit(X, y=None)[source]

Fit the TopologicalAnomalyDetector model.

Parameters:
  • X ({array-like, sparse matrix} of shape (n_timestamps, n_sensors)) – Multiple time series to transform, where n_timestamps is the number of timestamps in the series X, and n_sensors is the number of sensors.

  • y (Ignored) – Not used, present for API consistency by convention.

Returns:

self – Returns the instance itself.

Return type:

object

predict(X)[source]

Predict inliers (1) and outliers (-1) using learned threshold.

score_samples(X, y=None)[source]

Compute anomaly scores from topological features.

transform(X)[source]

Alias for score_samples. Returns anomaly scores.

tdaad.anomaly_detectors.score_flat_fast_remapping(scores, window_size, stride, padding_length=0)[source]

Remap window-level anomaly scores to a flat sequence of per-time-step scores.

Parameters:
  • scores (array-like of shape (n_windows,)) – Anomaly scores for each window. Can be a pandas Series or NumPy array.

  • window_size (int) – Size of the sliding window.

  • stride (int) – Step size between windows.

  • padding_length (int, optional (default=0)) – Extra length to pad the output array (typically at the end of a signal).

Returns:

remapped_scores – Flattened anomaly scores with per-timestep resolution. NaN values (from positions not covered by any window) are replaced with 0.

Return type:

np.ndarray of shape (n_timestamps + padding_length,)

tdaad.topological_embedding module

Topological Embedding Transformers.

class tdaad.topological_embedding.SlidingWindowTransformer(window_size=40, step=5)[source]

Bases: BaseEstimator, TransformerMixin

Slice a 2D numpy array into overlapping windows.

Output: list of 2D numpy arrays, one per window.

fit(X, y=None)[source]
transform(X)[source]
class tdaad.topological_embedding.TopologicalEmbedding(window_size: int = 40, step: int = 5, tda_max_dim: int = 2, n_centers_by_dim: int = 5, filter_nan: bool = True, output: str = 'pandas')[source]

Bases: BaseEstimator, TransformerMixin

Topological embedding for multivariate time series using sliding windows, persistent homology (Rips), and ATOL vectorization.

Pipeline:

Sliding windows -> similarity -> RipsPersistence -> ColumnTransformer(Atol)

Parameters:
  • window_size (int) – Number of rows per sliding window.

  • step (int) – Step size between windows.

  • tda_max_dim (int) – Maximum homology dimension for RipsPersistence.

  • n_centers_by_dim (int) – Number of centroids per homology dimension in ATOL.

  • filter_nan (bool) – Whether to filter NaNs in similarity matrices.

  • output (str, default="pandas") – β€œpandas” returns a DataFrame with proper index and column names. β€œnumpy” returns a numpy array.

fit(X, y=None)[source]

Fit the full pipeline to the data.

Parameters:
  • X (np.ndarray, shape (n_samples, n_features)) – Input multivariate time series.

  • y (Ignored)

transform(X)[source]

Transform the input data and return a pandas DataFrame with row index = window start position and columns named feature_0, feature_1, …

tdaad.topological_embedding.numpy_data_to_similarity(X, filter_nan=True)[source]

Transforms numpy matrix X into similarity matrix \(1-\mathbf{Corr}(X)\).

Module contents

Topological Data Analysis module for Anomaly Detection in Time Series

tdaad is a Python module integrating TDA tools from gudhi into learning algorithms designed to detect anomalies in Multiple/Multivariate Time Series.