tdaad package๏
Subpackages๏
Submodules๏
tdaad.anomaly_detectors module๏
Topological Anomaly Detectors.
- class tdaad.anomaly_detectors.TopologicalAnomalyDetector(window_size: int = 100, step: int = 5, tda_max_dim: int = 2, n_centers_by_dim: int = 5, support_fraction: float = None, contamination: float = 0.1, random_state: int = 42)[source]๏
Bases:
EllipticEnvelope
,TransformerMixin
Object for detecting anomaly base on Topological Embedding and sklearn.covariance.EllipticEnvelope.
This object analyzes multiple time series data through the following operations: - run a sliding window algorithm and represent each time series window with topological features,
- use a MinCovDet algorithm to robustly estimate the data mean and covariance in the embedding space,
and use these to derive an embedding mahalanobis distance and associated outlier detection procedure, see Elliptic Envelope.
After fitting, it is able to produce an anomaly score from a time series describing normal / abnormal time segments. (the lower, the more abnormal) The predict method (inherited from EllipticEnvelope) allows to transform that score into binary normal / anomaly labels.
Read more in the User Guide.
- Parameters:
window_size (int, default=40) โ Size of the sliding window algorithm to extract subsequences as input to named_pipeline.
step (int, default=5) โ Size of the sliding window steps between each window.
tda_max_dim (int, default=2) โ The maximum dimension of the topological feature extraction.
n_centers_by_dim (int, default=5) โ The number of centroids to generate by dimension for vectorizing topological features. The resulting embedding will have total dimension =< tda_max_dim * n_centers_by_dim. The resulting embedding dimension might be smaller because of the KMeans algorithm in the Archipelago step.
support_fraction (float, default=None) โ The proportion of points to be included in the support of the raw MCD estimate. If None, the minimum value of support_fraction will be used within the algorithm: [n_sample + n_features + 1] / 2. Range is (0, 1).
contamination (float, default=0.1) โ The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Range is (0, 0.5]. Only matters for computing the decision function.
random_state (int, RandomState instance or None, default=None) โ Determines the pseudo random number generator for shuffling the data. Pass an int for reproducible results across multiple function calls.
- topological_embedding_๏
TopologicalEmbedding transformer object that is fitted at fit.
- Type:
object
Examples
>>> import numpy as np >>> n_timestamps = 1000 >>> n_sensors = 20 >>> timestamps = pd.to_datetime('2024-01-01', utc=True) + pd.Timedelta(1, 'h') * np.arange(n_timestamps) >>> X = pd.DataFrame(np.random.random(size=(n_timestamps, n_sensors)), index=timestamps) >>> X.iloc[n_timestamps//2:,:10] = -X.iloc[n_timestamps//2:,10:20] >>> detector = TopologicalAnomalyDetector(n_centers_by_dim=2, tda_max_dim=1).fit(X) >>> anomaly_scores = detector.score_samples(X) >>> decision = detector.decision_function(X) >>> anomalies = detector.predict(X)
- fit(X, y=None)[source]๏
Fit the TopologicalAnomalyDetector model.
- Parameters:
X ({array-like, sparse matrix} of shape (n_timestamps, n_sensors)) โ Multiple time series to transform, where n_timestamps is the number of timestamps in the series X, and n_sensors is the number of sensors.
y (Ignored) โ Not used, present for API consistency by convention.
- Returns:
self โ Returns the instance itself.
- Return type:
object
- required_properties: Sequence[str] = ['multiple_time_series']๏
- score_samples(X, y=None)[source]๏
Compute the negative Mahalanobis distances associated with the TopologicalEmbedding representation of X.
- Parameters:
X ({array-like, sparse matrix} of shape (n_timestamps, n_sensors)) โ Multiple time series to transform, where n_timestamps is the number of timestamps in the series X, and n_sensors is the number of sensors.
y (Ignored) โ Not used, present for API consistency by convention.
- Returns:
negative_mahal_distances โ Opposite of the Mahalanobis distances.
- Return type:
ndarray of shape (n_samples,)
- transform(X, y=None)๏
Compute the negative Mahalanobis distances associated with the TopologicalEmbedding representation of X.
- Parameters:
X ({array-like, sparse matrix} of shape (n_timestamps, n_sensors)) โ Multiple time series to transform, where n_timestamps is the number of timestamps in the series X, and n_sensors is the number of sensors.
y (Ignored) โ Not used, present for API consistency by convention.
- Returns:
negative_mahal_distances โ Opposite of the Mahalanobis distances.
- Return type:
ndarray of shape (n_samples,)
tdaad.persistencediagram_transformer module๏
Persistence Diagram Transformers.
- class tdaad.persistencediagram_transformer.PersistenceDiagramTransformer(tda_max_dim=2)[source]๏
Bases:
LocalPipeline
Persistence Diagram Transformer for point cloud.
For a given point cloud, form a similarity matrix and apply a RipsPersistence procedure to produce topological descriptors in the form of persistence diagrams.
Read more in the :ref: User Guide <persistence_diagrams>.
- Parameters:
tda_max_dim โ int, default=2 The maximum dimension of the topological feature extraction.
Example
>>> n_timestamps = 100 >>> n_sensors = 5 >>> timestamps = pd.to_datetime('2024-01-01', utc=True) + pd.Timedelta(1, 'h') * np.arange(n_timestamps) >>> X = pd.DataFrame(np.random.random(size=(n_timestamps, n_sensors)), index=timestamps) >>> PersistenceDiagramTransformer().fit_transform(X)
- fit_transform(X, y=None, **fit_params)[source]๏
Transforms data X into a list of persistence diagrams arranged in order of homology dimension.
- Parameters:
X โ {array-like, sparse matrix} of shape (n_timestamps, n_sensors) Multiple time series to transform, where n_timestamps is the number of timestamps in the series X, and n_sensors is the number of sensors.
y โ Ignored Not used, present for API consistency by convention.
**fit_params โ Ignored Not used, present for API consistency.
Nb: this function can be removed, but is here so that returns can be explicited.
Returns:๏
- by_dim_arrays: list of persistence diagrams [pd_0, pd_1, โฆ] arranged in order of homology dimension.
a persistence diagram pd_i is a ndarray of shape {n_i, 2} where n_i is the number of homological features in dimension i found in the similarity matrix of the data.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PersistenceDiagramTransformer ๏
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) โ Metadata routing for
sample_weight
parameter inscore
.- Returns:
self โ The updated object.
- Return type:
object
tdaad.topological_embedding module๏
Topological Embedding Transformers.
- class tdaad.topological_embedding.TopologicalEmbedding(window_size: int = 40, step: int = 5, tda_max_dim: int = 2, n_centers_by_dim: int = 5)[source]๏
Bases:
LocalPipeline
Topological embedding for multiple time series.
Slices time series into smaller time series windows, forms an affinity matrix on each window and applies a Rips procedure to produce persistence diagrams for each affinity matrix. Then uses Atol [ref:Atol] on each dimension through the gudhi.representation.Archipelago representation to produce topological vectorization.
Read more in the User Guide.
- Parameters:
window_size (int, default=40) โ Size of the sliding window algorithm to extract subsequences as input to named_pipeline.
step (int, default=5) โ Size of the sliding window steps between each window.
n_centers_by_dim (int, default=5) โ The number of centroids to generate by dimension for vectorizing topological features. The resulting embedding will have total dimension =< tda_max_dim * n_centers_by_dim. The resulting embedding dimension might be smaller because of the KMeans algorithm in the Archipelago step.
tda_max_dim (int, default=2) โ The maximum dimension of the topological feature extraction.
Examples
>>> n_timestamps = 100 >>> n_sensors = 5 >>> timestamps = pd.to_datetime('2024-01-01', utc=True) + pd.Timedelta(1, 'h') * np.arange(n_timestamps) >>> X = pd.DataFrame(np.random.random(size=(n_timestamps, n_sensors)), index=timestamps) >>> TopologicalEmbedding(n_centers_by_dim=2, tda_max_dim=1).fit_transform(X)
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TopologicalEmbedding ๏
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) โ Metadata routing for
sample_weight
parameter inscore
.- Returns:
self โ The updated object.
- Return type:
object
Module contents๏
Topological Data Analysis module for Anomaly Detection in Time Series๏
tdaad is a Python module integrating TDA tools from gudhi into learning algorithms designed to detect anomalies in Multiple/Multivariate Time Series.