tdaad package๏ƒ

Subpackages๏ƒ

Submodules๏ƒ

tdaad.anomaly_detectors module๏ƒ

Topological Anomaly Detectors.

class tdaad.anomaly_detectors.TopologicalAnomalyDetector(window_size: int = 100, step: int = 5, tda_max_dim: int = 1, n_centers_by_dim: int = 5, support_fraction: float | None = None, contamination: float = 0.1, random_state: int | RandomState | None = 42)[source]๏ƒ

Bases: EllipticEnvelope, TransformerMixin

Anomaly detection for multivariate time series using topological embeddings and robust covariance estimation.

This detector extracts topological features from sliding windows of time series data and uses a robust Mahalanobis distance (via EllipticEnvelope) to score anomalies.

Read more in the User Guide.

Parameters:
  • window_size (int, default=100) โ€“ Sliding window size for extracting time series subsequences.

  • step (int, default=5) โ€“ Step size between windows.

  • tda_max_dim (int, default=1) โ€“ Maximum homology dimension used for topological feature extraction.

  • n_centers_by_dim (int, default=5) โ€“ Number of k-means centers per topological dimension (for vectorization).

  • support_fraction (float or None, default=None) โ€“ Proportion of data to use for robust covariance estimation. If None, computed automatically.

  • contamination (float, default=0.1) โ€“ Proportion of anomalies in the data, used to compute decision threshold.

  • random_state (int, RandomState instance, or None, default=42) โ€“ Controls randomness of the topological embedding and robust estimator.

topological_embedding_๏ƒ

TopologicalEmbedding transformer object that is fitted at fit.

Type:

object

Examples

>>> n_timestamps = 1000
>>> n_sensors = 20
>>> import pandas as pd
>>> timestamps = pd.to_datetime('2024-01-01', utc=True) + pd.Timedelta(1, 'h') * np.arange(n_timestamps)
>>> X = pd.DataFrame(np.random.random(size=(n_timestamps, n_sensors)), index=timestamps)
>>> X.iloc[n_timestamps//2:,:10] = -X.iloc[n_timestamps//2:,10:20]
>>> detector = TopologicalAnomalyDetector(n_centers_by_dim=2, tda_max_dim=1).fit(X)
>>> anomaly_scores = detector.score_samples(X)
>>> decision = detector.decision_function(X)
>>> anomalies = detector.predict(X)
decision_function(X)[source]๏ƒ

Return the distance to the decision boundary.

fit(X, y=None)[source]๏ƒ

Fit the TopologicalAnomalyDetector model.

Parameters:
  • X ({array-like, sparse matrix} of shape (n_timestamps, n_sensors)) โ€“ Multiple time series to transform, where n_timestamps is the number of timestamps in the series X, and n_sensors is the number of sensors.

  • y (Ignored) โ€“ Not used, present for API consistency by convention.

Returns:

self โ€“ Returns the instance itself.

Return type:

object

predict(X)[source]๏ƒ

Predict inliers (1) and outliers (-1) using learned threshold.

required_properties: Sequence[str] = ['multiple_time_series']๏ƒ
score_samples(X, y=None)[source]๏ƒ

Compute anomaly scores from topological features.

transform(X)[source]๏ƒ

Alias for score_samples. Returns anomaly scores.

tdaad.topological_embedding module๏ƒ

Topological Embedding Transformers.

class tdaad.topological_embedding.TopologicalEmbedding(window_size: int = 40, step: int = 5, tda_max_dim: int = 2, n_centers_by_dim: int = 5)[source]๏ƒ

Bases: BaseEstimator, TransformerMixin

Topological embedding for multiple time series.

Slices time series into smaller time series windows, forms an affinity matrix on each window and applies a Rips procedure to produce persistence diagrams for each affinity matrix. Then uses Atol [ref:Atol] on each dimension through the gudhi.representation.Archipelago representation to produce topological vectorization.

Read more in the User Guide.

Parameters:
  • window_size (int, default=40) โ€“ Size of the sliding window algorithm to extract subsequences as input to named_pipeline.

  • step (int, default=5) โ€“ Size of the sliding window steps between each window.

  • n_centers_by_dim (int, default=5) โ€“ The number of centroids to generate by dimension for vectorizing topological features. The resulting embedding will have total dimension =< tda_max_dim * n_centers_by_dim. The resulting embedding dimension might be smaller because of the KMeans algorithm in the Archipelago step.

  • tda_max_dim (int, default=2) โ€“ The maximum dimension of the topological feature extraction.

Examples

>>> n_timestamps = 100
>>> n_sensors = 5
>>> timestamps = pd.to_datetime('2024-01-01', utc=True) + pd.Timedelta(1, 'h') * np.arange(n_timestamps)
>>> X = pd.DataFrame(np.random.random(size=(n_timestamps, n_sensors)), index=timestamps)
>>> TopologicalEmbedding(n_centers_by_dim=2, tda_max_dim=1).fit_transform(X)
fit(X, y=None)[source]๏ƒ

Fit the internal pipeline to the data.

Parameters:
  • X (pandas.DataFrame) โ€“ Input feature matrix.

  • y (array-like, optional) โ€“ Target values (not used here, but accepted for compatibility with sklearn).

Returns:

self โ€“ Fitted transformer.

Return type:

object

fit_transform(X, y=None, **fit_params)[source]๏ƒ

Fit to data, then transform it.

Returns:

X_transformed

Return type:

array-like

transform(X)[source]๏ƒ

Apply transformations to the input data using the fitted pipeline.

Parameters:

X (pandas.DataFrame) โ€“ Input data to transform.

Returns:

X_transformed โ€“ Transformed data.

Return type:

array-like or DataFrame

tdaad.topological_embedding.local_atol_fit(self, X, y=None, sample_weight=None)[source]๏ƒ

local modification to prevent FutureWarning triggered by np.concatenate(X) when X is a pd.Series.

Module contents๏ƒ

Topological Data Analysis module for Anomaly Detection in Time Series๏ƒ

tdaad is a Python module integrating TDA tools from gudhi into learning algorithms designed to detect anomalies in Multiple/Multivariate Time Series.