tdaad package

Subpackages

tdaad.utils package

Submodules

tdaad.anomaly_detectors module

Topological Anomaly Detectors.

class tdaad.anomaly_detectors.TopologicalAnomalyDetector(window_size: int = 100, step: int = 5, tda_max_dim: int = 2, n_centers_by_dim: int = 5, support_fraction: float = None, contamination: float = 0.1, random_state: int = 42)[source]

Bases: EllipticEnvelope, TransformerMixin

Object for detecting anomaly base on Topological Embedding and sklearn.covariance.EllipticEnvelope.

This object analyzes multiple time series data through the following operations: - run a sliding window algorithm and represent each time series window with topological features,

see Topological Embedding,

use a MinCovDet algorithm to robustly estimate the data mean and covariance in the embedding space,
and use these to derive an embedding mahalanobis distance and associated outlier detection procedure, see Elliptic Envelope.

After fitting, it is able to produce an anomaly score from a time series describing normal / abnormal time segments. (the lower, the more abnormal) The predict method (inherited from EllipticEnvelope) allows to transform that score into binary normal / anomaly labels.

Read more in the User Guide.

Parameters:

window_size (int, default=40) – Size of the sliding window algorithm to extract subsequences as input to named_pipeline.
step (int, default=5) – Size of the sliding window steps between each window.
tda_max_dim (int, default=2) – The maximum dimension of the topological feature extraction.
n_centers_by_dim (int, default=5) – The number of centroids to generate by dimension for vectorizing topological features. The resulting embedding will have total dimension =< tda_max_dim * n_centers_by_dim. The resulting embedding dimension might be smaller because of the KMeans algorithm in the Archipelago step.
support_fraction (float, default=None) – The proportion of points to be included in the support of the raw MCD estimate. If None, the minimum value of support_fraction will be used within the algorithm: [n_sample + n_features + 1] / 2. Range is (0, 1).
contamination (float, default=0.1) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Range is (0, 0.5]. Only matters for computing the decision function.
random_state (int, RandomState instance or None, default=None) – Determines the pseudo random number generator for shuffling the data. Pass an int for reproducible results across multiple function calls.

topological_embedding_

TopologicalEmbedding transformer object that is fitted at fit.

Type:: object

Examples

>>> import numpy as np
>>> n_timestamps = 1000
>>> n_sensors = 20
>>> timestamps = pd.to_datetime('2024-01-01', utc=True) + pd.Timedelta(1, 'h') * np.arange(n_timestamps)
>>> X = pd.DataFrame(np.random.random(size=(n_timestamps, n_sensors)), index=timestamps)
>>> X.iloc[n_timestamps//2:,:10] = -X.iloc[n_timestamps//2:,10:20]
>>> detector = TopologicalAnomalyDetector(n_centers_by_dim=2, tda_max_dim=1).fit(X)
>>> anomaly_scores = detector.score_samples(X)
>>> decision = detector.decision_function(X)
>>> anomalies = detector.predict(X)

fit(X, y=None)[source]

Fit the TopologicalAnomalyDetector model.

Parameters:

X ({array-like, sparse matrix} of shape (n_timestamps, n_sensors)) – Multiple time series to transform, where n_timestamps is the number of timestamps in the series X, and n_sensors is the number of sensors.
y (Ignored) – Not used, present for API consistency by convention.

Returns:

self – Returns the instance itself.

Return type:

object

required_properties: Sequence[str] = ['multiple_time_series']

score_samples(X, y=None)[source]

Compute the negative Mahalanobis distances associated with the TopologicalEmbedding representation of X.

Parameters:

X ({array-like, sparse matrix} of shape (n_timestamps, n_sensors)) – Multiple time series to transform, where n_timestamps is the number of timestamps in the series X, and n_sensors is the number of sensors.
y (Ignored) – Not used, present for API consistency by convention.

Returns:

negative_mahal_distances – Opposite of the Mahalanobis distances.

Return type:

ndarray of shape (n_samples,)

transform(X, y=None)

Compute the negative Mahalanobis distances associated with the TopologicalEmbedding representation of X.

Parameters:

X ({array-like, sparse matrix} of shape (n_timestamps, n_sensors)) – Multiple time series to transform, where n_timestamps is the number of timestamps in the series X, and n_sensors is the number of sensors.
y (Ignored) – Not used, present for API consistency by convention.

Returns:

negative_mahal_distances – Opposite of the Mahalanobis distances.

Return type:

ndarray of shape (n_samples,)

tdaad.persistencediagram_transformer module

Persistence Diagram Transformers.

class tdaad.persistencediagram_transformer.PersistenceDiagramTransformer(tda_max_dim=2)[source]

Bases: LocalPipeline

Persistence Diagram Transformer for point cloud.

For a given point cloud, form a similarity matrix and apply a RipsPersistence procedure to produce topological descriptors in the form of persistence diagrams.

Read more in the :ref: User Guide <persistence_diagrams>.

Parameters:: tda_max_dim – int, default=2 The maximum dimension of the topological feature extraction.

Example

>>> n_timestamps = 100
>>> n_sensors = 5
>>> timestamps = pd.to_datetime('2024-01-01', utc=True) + pd.Timedelta(1, 'h') * np.arange(n_timestamps)
>>> X = pd.DataFrame(np.random.random(size=(n_timestamps, n_sensors)), index=timestamps)
>>> PersistenceDiagramTransformer().fit_transform(X)

fit_transform(X, y=None, **fit_params)[source]

Transforms data X into a list of persistence diagrams arranged in order of homology dimension.

Parameters:

X – {array-like, sparse matrix} of shape (n_timestamps, n_sensors) Multiple time series to transform, where n_timestamps is the number of timestamps in the series X, and n_sensors is the number of sensors.
y – Ignored Not used, present for API consistency by convention.
**fit_params – Ignored Not used, present for API consistency.

Nb: this function can be removed, but is here so that returns can be explicited.

Returns:

by_dim_arrays: list of persistence diagrams [pd_0, pd_1, …] arranged in order of homology dimension.: a persistence diagram pd_i is a ndarray of shape {n_i, 2} where n_i is the number of homological features in dimension i found in the similarity matrix of the data.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → PersistenceDiagramTransformer

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

tdaad.persistencediagram_transformer.wrap_in_list(X)[source]: wrapper because RipsPersistence.transform expects a list

tdaad.topological_embedding module

Topological Embedding Transformers.

class tdaad.topological_embedding.TopologicalEmbedding(window_size: int = 40, step: int = 5, tda_max_dim: int = 2, n_centers_by_dim: int = 5)[source]

Bases: LocalPipeline

Topological embedding for multiple time series.

Slices time series into smaller time series windows, forms an affinity matrix on each window and applies a Rips procedure to produce persistence diagrams for each affinity matrix. Then uses Atol [ref:Atol] on each dimension through the gudhi.representation.Archipelago representation to produce topological vectorization.

Read more in the User Guide.

Parameters:

window_size (int, default=40) – Size of the sliding window algorithm to extract subsequences as input to named_pipeline.
step (int, default=5) – Size of the sliding window steps between each window.
n_centers_by_dim (int, default=5) – The number of centroids to generate by dimension for vectorizing topological features. The resulting embedding will have total dimension =< tda_max_dim * n_centers_by_dim. The resulting embedding dimension might be smaller because of the KMeans algorithm in the Archipelago step.
tda_max_dim (int, default=2) – The maximum dimension of the topological feature extraction.

Examples

>>> n_timestamps = 100
>>> n_sensors = 5
>>> timestamps = pd.to_datetime('2024-01-01', utc=True) + pd.Timedelta(1, 'h') * np.arange(n_timestamps)
>>> X = pd.DataFrame(np.random.random(size=(n_timestamps, n_sensors)), index=timestamps)
>>> TopologicalEmbedding(n_centers_by_dim=2, tda_max_dim=1).fit_transform(X)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → TopologicalEmbedding

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

tdaad.topological_embedding.local_atol_fit(self, X, y=None, sample_weight=None)[source]: local modification to prevent FutureWarning triggered by np.concatenate(X) when X is a pd.Series.

Module contents

Topological Data Analysis module for Anomaly Detection in Time Series

tdaad is a Python module integrating TDA tools from gudhi into learning algorithms designed to detect anomalies in Multiple/Multivariate Time Series.