Learning on a modified Ornstein–Uhlenbeck process
The Ornstein-Uhlenbeck process is an auto-regressive process of order 1, and can be understood as a random walk that can be used for modelling timeseries in applicative fields such as physics, biology or finance. We generate that process and disturb it at times with gaussian noise for demonstrating this component.
[1]:
def ornstein_uhlenbeck_anomaly(
t,
size=1,
noise_scale=1,
mean_reverting=1,
anomaly_freq=1,
anomaly_duration=.1,
anomaly_scale=1,
seed=314,
):
rng = np.random.RandomState(seed)
length, = t.shape
y = np.zeros(length)
X = np.empty((length, size))
X[0] = 0
target = np.zeros(size)
for i in range(1, length):
dt = t[i] - t[i - 1]
target *= np.exp(-dt / anomaly_duration)
X[i] = rng.normal(
target + (X[i - 1] - target) * np.exp(-mean_reverting * dt),
noise_scale * dt ** .5
)
if rng.rand() < dt / anomaly_freq:
y[i] = 1
target += rng.normal(scale=anomaly_scale, size=size)
return X, y
[2]:
import numpy as np
import pandas as pd
t = np.arange(1000)
t = t / t.shape[0]
pre_X, pre_y = ornstein_uhlenbeck_anomaly(
t,
size=20,
noise_scale=1,
mean_reverting=3,
anomaly_freq=.1,
anomaly_duration=0.005,
anomaly_scale=40,
)
t = pd.to_datetime('2021-01-01') + pd.Timedelta(1, 'h') * np.arange(t.shape[0])
X = pd.DataFrame(pre_X, index=t, columns=[f'X{i}' for i in range(pre_X.shape[1])])
y = pd.DataFrame(pre_y[:, None], index=t, columns=['y'])
data = (X, y)
pd.concat(data, axis=1).plot()
[2]:
<Axes: >

Topological embedding of a multiple time series
The persistence diagram transform
A major tool from the Topological Data Analysis (TDA) field is the persistence diagram, that summarizes information in the form of a set of points in \(\mathbf{R}^2\).
For instance if we try to extract topological information on the entire multiple timeseries X
:
[3]:
from tdaad.topological_embedding import PersistenceDiagramTransformer
from gudhi import plot_persistence_diagram
global_pdiagram = PersistenceDiagramTransformer().fit_transform(X)
plot_persistence_diagram(global_pdiagram[0])
[3]:
<Axes: title={'center': 'Persistence diagram'}, xlabel='Birth', ylabel='Death'>

The sliding window algorithm
In order to capture local information and detect potential anomalies, we cut the multiple timeseries in regular chunks or windows, and apply the Persistence Diagram Transform on each window. Therefore we transform the original time series into a persistence diagram time series, for instance if we cut it in three parts:
[4]:
swv = np.lib.stride_tricks.sliding_window_view(X.index, X.shape[0]//2)[::200, :]
for window in swv:
pdiagram = PersistenceDiagramTransformer().fit_transform(X.loc[window])
X.loc[window].plot()
ax = plot_persistence_diagram(pdiagram[0])
ax.set_title(str(window[-1]))






Vectorizing topological information
These persistence diagrams carry the information associated to a homological dimension, but the number of these points cannot be predicted generally, and therefore persistence diagrams need to be vectorized in order to integrate a classical machine learning pipeline. This is done using the Atol
and Archipelago
tools from the Gudhi library, and the PersistenceDiagramTransformer
is integrated into a larger pipeline making use of these tools.
Selecting a number of encoding points by dimension, n_centers_by_dim
, this results in a representation method we call TopologicalEmbedding
.
Now we apply the procedure from before with the sliding window algorithm. So instead of a single embedding vector, we get a vector for each window in the timeseries X
:
[5]:
from tdaad.topological_embedding import TopologicalEmbedding
embedder = TopologicalEmbedding(window_size=50, n_centers_by_dim=2).fit(X)
embedding = embedder.transform(X)
embedding
100%|██████████| 191/191 [00:00<00:00, 773.60it/s]
100%|██████████| 191/191 [00:00<00:00, 818.66it/s]
[5]:
Atol0__Atol Center 1 | Atol0__Atol Center 2 | Atol1__Atol Center 1 | Atol1__Atol Center 2 | Atol2__Atol Center 1 | Atol2__Atol Center 2 | |
---|---|---|---|---|---|---|
0 | 8.014427 | 8.337348 | 0.139429 | 2.369379 | 1.041305 | 0.869459 |
1 | 7.646981 | 8.840381 | 0.905686 | 0.737120 | 0.000000 | 0.000000 |
2 | 8.490968 | 7.406048 | 1.041629 | 1.851082 | 0.934886 | 0.006830 |
3 | 8.781508 | 7.212324 | 1.795035 | 1.523272 | 0.009076 | 0.957346 |
4 | 10.458802 | 5.208043 | 1.027982 | 1.873740 | 0.000000 | 0.000000 |
... | ... | ... | ... | ... | ... | ... |
186 | 9.582538 | 4.739616 | 0.158071 | 1.707738 | 0.000000 | 0.000000 |
187 | 9.188312 | 5.705798 | 0.003870 | 1.839969 | 0.000000 | 0.000000 |
188 | 9.862403 | 6.028840 | 0.504404 | 1.867278 | 0.128751 | 0.723518 |
189 | 8.202467 | 7.330862 | 0.080560 | 2.732715 | 0.000000 | 0.000000 |
190 | 8.475916 | 6.191814 | 1.013029 | 2.501380 | 0.205330 | 1.519385 |
191 rows × 6 columns
[6]:
import matplotlib
matplotlib.rcParams['text.usetex'] = True
embedding.loc[X.index[0], :] = np.nan
embedding.plot(subplots=True)
[6]:
array([<Axes: >, <Axes: >, <Axes: >, <Axes: >, <Axes: >, <Axes: >],
dtype=object)

Anomaly Detection based on topological features
What is left is to analyze those topological features for detecting anomaly. We use a procedure from scikit-learn
called EllipticEnvelope
based on a robust covariance estimation procedure, that estimates the topological mean and covariance of the vectors and use these to produce the associated mahalanobis distance. Once this distance is defined, an anomaly score is simply derived from it.
[7]:
from tdaad.anomaly_detectors import TopologicalAnomalyDetector
detector = TopologicalAnomalyDetector(window_size=50, n_centers_by_dim=2, tda_max_dim=1).fit(X)
anomaly_score = detector.score_samples(X)
anomaly_score
100%|██████████| 191/191 [00:00<00:00, 1088.03it/s]
X.shape[0]=1000, self.window_size=50, self.step=5, so running self.padding_length_=0...
100%|██████████| 191/191 [00:00<00:00, 1045.76it/s]
[7]:
array([ -3.62090796, -3.62090796, -3.62090796, -3.62090796,
-3.62090796, -10.08873951, -10.08873951, -10.08873951,
-10.08873951, -10.08873951, -10.78583457, -10.78583457,
-10.78583457, -10.78583457, -10.78583457, -12.18904978,
-12.18904978, -12.18904978, -12.18904978, -12.18904978,
-12.72126438, -12.72126438, -12.72126438, -12.72126438,
-12.72126438, -17.02984659, -17.02984659, -17.02984659,
-17.02984659, -17.02984659, -18.43424569, -18.43424569,
-18.43424569, -18.43424569, -18.43424569, -23.00511013,
-23.00511013, -23.00511013, -23.00511013, -23.00511013,
-25.57913524, -25.57913524, -25.57913524, -25.57913524,
-25.57913524, -31.57776802, -31.57776802, -31.57776802,
-31.57776802, -31.57776802, -31.45946288, -31.45946288,
-31.45946288, -31.45946288, -31.45946288, -32.41516313,
-32.41516313, -32.41516313, -32.41516313, -32.41516313,
-33.62752756, -33.62752756, -33.62752756, -33.62752756,
-33.62752756, -34.02074975, -34.02074975, -34.02074975,
-34.02074975, -34.02074975, -38.5246302 , -38.5246302 ,
-38.5246302 , -38.5246302 , -38.5246302 , -38.63818671,
-38.63818671, -38.63818671, -38.63818671, -38.63818671,
-39.28788031, -39.28788031, -39.28788031, -39.28788031,
-39.28788031, -42.6035978 , -42.6035978 , -42.6035978 ,
-42.6035978 , -42.6035978 , -50.29238806, -50.29238806,
-50.29238806, -50.29238806, -50.29238806, -46.10305427,
-46.10305427, -46.10305427, -46.10305427, -46.10305427,
-50.10379525, -50.10379525, -50.10379525, -50.10379525,
-50.10379525, -48.43836262, -48.43836262, -48.43836262,
-48.43836262, -48.43836262, -47.72314895, -47.72314895,
-47.72314895, -47.72314895, -47.72314895, -47.91093761,
-47.91093761, -47.91093761, -47.91093761, -47.91093761,
-47.98965423, -47.98965423, -47.98965423, -47.98965423,
-47.98965423, -45.11052359, -45.11052359, -45.11052359,
-45.11052359, -45.11052359, -46.47204516, -46.47204516,
-46.47204516, -46.47204516, -46.47204516, -39.44578881,
-39.44578881, -39.44578881, -39.44578881, -39.44578881,
-32.90720578, -32.90720578, -32.90720578, -32.90720578,
-32.90720578, -35.10904563, -35.10904563, -35.10904563,
-35.10904563, -35.10904563, -32.45779434, -32.45779434,
-32.45779434, -32.45779434, -32.45779434, -28.93868572,
-28.93868572, -28.93868572, -28.93868572, -28.93868572,
-31.70128123, -31.70128123, -31.70128123, -31.70128123,
-31.70128123, -32.31686914, -32.31686914, -32.31686914,
-32.31686914, -32.31686914, -30.90304808, -30.90304808,
-30.90304808, -30.90304808, -30.90304808, -37.61721775,
-37.61721775, -37.61721775, -37.61721775, -37.61721775,
-47.14621736, -47.14621736, -47.14621736, -47.14621736,
-47.14621736, -53.92175157, -53.92175157, -53.92175157,
-53.92175157, -53.92175157, -52.31800896, -52.31800896,
-52.31800896, -52.31800896, -52.31800896, -55.03285144,
-55.03285144, -55.03285144, -55.03285144, -55.03285144,
-52.78405115, -52.78405115, -52.78405115, -52.78405115,
-52.78405115, -54.53167815, -54.53167815, -54.53167815,
-54.53167815, -54.53167815, -52.09853585, -52.09853585,
-52.09853585, -52.09853585, -52.09853585, -53.56135449,
-53.56135449, -53.56135449, -53.56135449, -53.56135449,
-52.1935825 , -52.1935825 , -52.1935825 , -52.1935825 ,
-52.1935825 , -49.5214615 , -49.5214615 , -49.5214615 ,
-49.5214615 , -49.5214615 , -40.56280223, -40.56280223,
-40.56280223, -40.56280223, -40.56280223, -44.41263656,
-44.41263656, -44.41263656, -44.41263656, -44.41263656,
-48.66065978, -48.66065978, -48.66065978, -48.66065978,
-48.66065978, -49.99896066, -49.99896066, -49.99896066,
-49.99896066, -49.99896066, -49.34922552, -49.34922552,
-49.34922552, -49.34922552, -49.34922552, -54.0744707 ,
-54.0744707 , -54.0744707 , -54.0744707 , -54.0744707 ,
-66.11419323, -66.11419323, -66.11419323, -66.11419323,
-66.11419323, -75.57244603, -75.57244603, -75.57244603,
-75.57244603, -75.57244603, -80.82660304, -80.82660304,
-80.82660304, -80.82660304, -80.82660304, -78.05884369,
-78.05884369, -78.05884369, -78.05884369, -78.05884369,
-77.59082894, -77.59082894, -77.59082894, -77.59082894,
-77.59082894, -69.23543809, -69.23543809, -69.23543809,
-69.23543809, -69.23543809, -69.03692792, -69.03692792,
-69.03692792, -69.03692792, -69.03692792, -63.27832293,
-63.27832293, -63.27832293, -63.27832293, -63.27832293,
-67.06954701, -67.06954701, -67.06954701, -67.06954701,
-67.06954701, -61.91295318, -61.91295318, -61.91295318,
-61.91295318, -61.91295318, -52.12720126, -52.12720126,
-52.12720126, -52.12720126, -52.12720126, -43.39671989,
-43.39671989, -43.39671989, -43.39671989, -43.39671989,
-41.34592409, -41.34592409, -41.34592409, -41.34592409,
-41.34592409, -45.94835351, -45.94835351, -45.94835351,
-45.94835351, -45.94835351, -45.18761033, -45.18761033,
-45.18761033, -45.18761033, -45.18761033, -50.88942775,
-50.88942775, -50.88942775, -50.88942775, -50.88942775,
-47.30047383, -47.30047383, -47.30047383, -47.30047383,
-47.30047383, -48.33243339, -48.33243339, -48.33243339,
-48.33243339, -48.33243339, -48.65493539, -48.65493539,
-48.65493539, -48.65493539, -48.65493539, -52.78563103,
-52.78563103, -52.78563103, -52.78563103, -52.78563103,
-50.76485606, -50.76485606, -50.76485606, -50.76485606,
-50.76485606, -58.03554053, -58.03554053, -58.03554053,
-58.03554053, -58.03554053, -63.73312163, -63.73312163,
-63.73312163, -63.73312163, -63.73312163, -74.63117375,
-74.63117375, -74.63117375, -74.63117375, -74.63117375,
-85.15165911, -85.15165911, -85.15165911, -85.15165911,
-85.15165911, -82.42196279, -82.42196279, -82.42196279,
-82.42196279, -82.42196279, -82.5435437 , -82.5435437 ,
-82.5435437 , -82.5435437 , -82.5435437 , -82.85375948,
-82.85375948, -82.85375948, -82.85375948, -82.85375948,
-77.87969231, -77.87969231, -77.87969231, -77.87969231,
-77.87969231, -73.5448301 , -73.5448301 , -73.5448301 ,
-73.5448301 , -73.5448301 , -73.9658258 , -73.9658258 ,
-73.9658258 , -73.9658258 , -73.9658258 , -63.79194345,
-63.79194345, -63.79194345, -63.79194345, -63.79194345,
-57.58108672, -57.58108672, -57.58108672, -57.58108672,
-57.58108672, -40.23220596, -40.23220596, -40.23220596,
-40.23220596, -40.23220596, -41.56030716, -41.56030716,
-41.56030716, -41.56030716, -41.56030716, -77.93239779,
-77.93239779, -77.93239779, -77.93239779, -77.93239779,
-127.48673713, -127.48673713, -127.48673713, -127.48673713,
-127.48673713, -179.34701167, -179.34701167, -179.34701167,
-179.34701167, -179.34701167, -267.51322046, -267.51322046,
-267.51322046, -267.51322046, -267.51322046, -359.26298608,
-359.26298608, -359.26298608, -359.26298608, -359.26298608,
-418.5569144 , -418.5569144 , -418.5569144 , -418.5569144 ,
-418.5569144 , -453.42889887, -453.42889887, -453.42889887,
-453.42889887, -453.42889887, -464.27793693, -464.27793693,
-464.27793693, -464.27793693, -464.27793693, -466.90705157,
-466.90705157, -466.90705157, -466.90705157, -466.90705157,
-453.13421738, -453.13421738, -453.13421738, -453.13421738,
-453.13421738, -414.78888627, -414.78888627, -414.78888627,
-414.78888627, -414.78888627, -367.4540131 , -367.4540131 ,
-367.4540131 , -367.4540131 , -367.4540131 , -315.54657638,
-315.54657638, -315.54657638, -315.54657638, -315.54657638,
-248.04666653, -248.04666653, -248.04666653, -248.04666653,
-248.04666653, -183.59516919, -183.59516919, -183.59516919,
-183.59516919, -183.59516919, -175.06735838, -175.06735838,
-175.06735838, -175.06735838, -175.06735838, -199.89003815,
-199.89003815, -199.89003815, -199.89003815, -199.89003815,
-260.83807633, -260.83807633, -260.83807633, -260.83807633,
-260.83807633, -322.20340895, -322.20340895, -322.20340895,
-322.20340895, -322.20340895, -350.42226808, -350.42226808,
-350.42226808, -350.42226808, -350.42226808, -363.25798328,
-363.25798328, -363.25798328, -363.25798328, -363.25798328,
-363.70676502, -363.70676502, -363.70676502, -363.70676502,
-363.70676502, -378.28090278, -378.28090278, -378.28090278,
-378.28090278, -378.28090278, -362.05856559, -362.05856559,
-362.05856559, -362.05856559, -362.05856559, -338.8806949 ,
-338.8806949 , -338.8806949 , -338.8806949 , -338.8806949 ,
-287.62828084, -287.62828084, -287.62828084, -287.62828084,
-287.62828084, -227.48131613, -227.48131613, -227.48131613,
-227.48131613, -227.48131613, -151.86148195, -151.86148195,
-151.86148195, -151.86148195, -151.86148195, -89.60697089,
-89.60697089, -89.60697089, -89.60697089, -89.60697089,
-63.29330952, -63.29330952, -63.29330952, -63.29330952,
-63.29330952, -49.55795657, -49.55795657, -49.55795657,
-49.55795657, -49.55795657, -47.045813 , -47.045813 ,
-47.045813 , -47.045813 , -47.045813 , -30.29376241,
-30.29376241, -30.29376241, -30.29376241, -30.29376241,
-36.00891023, -36.00891023, -36.00891023, -36.00891023,
-36.00891023, -32.32139204, -32.32139204, -32.32139204,
-32.32139204, -32.32139204, -35.02436513, -35.02436513,
-35.02436513, -35.02436513, -35.02436513, -37.67405456,
-37.67405456, -37.67405456, -37.67405456, -37.67405456,
-39.80933939, -39.80933939, -39.80933939, -39.80933939,
-39.80933939, -38.33560525, -38.33560525, -38.33560525,
-38.33560525, -38.33560525, -44.42853951, -44.42853951,
-44.42853951, -44.42853951, -44.42853951, -45.64543219,
-45.64543219, -45.64543219, -45.64543219, -45.64543219,
-45.21862114, -45.21862114, -45.21862114, -45.21862114,
-45.21862114, -51.07988061, -51.07988061, -51.07988061,
-51.07988061, -51.07988061, -42.04076733, -42.04076733,
-42.04076733, -42.04076733, -42.04076733, -42.36440188,
-42.36440188, -42.36440188, -42.36440188, -42.36440188,
-38.80380634, -38.80380634, -38.80380634, -38.80380634,
-38.80380634, -38.30312025, -38.30312025, -38.30312025,
-38.30312025, -38.30312025, -42.33929893, -42.33929893,
-42.33929893, -42.33929893, -42.33929893, -43.07194965,
-43.07194965, -43.07194965, -43.07194965, -43.07194965,
-38.68956918, -38.68956918, -38.68956918, -38.68956918,
-38.68956918, -41.26557772, -41.26557772, -41.26557772,
-41.26557772, -41.26557772, -39.44063089, -39.44063089,
-39.44063089, -39.44063089, -39.44063089, -41.09613501,
-41.09613501, -41.09613501, -41.09613501, -41.09613501,
-47.16798911, -47.16798911, -47.16798911, -47.16798911,
-47.16798911, -46.87630374, -46.87630374, -46.87630374,
-46.87630374, -46.87630374, -54.2742902 , -54.2742902 ,
-54.2742902 , -54.2742902 , -54.2742902 , -52.03250394,
-52.03250394, -52.03250394, -52.03250394, -52.03250394,
-47.83084237, -47.83084237, -47.83084237, -47.83084237,
-47.83084237, -47.86537761, -47.86537761, -47.86537761,
-47.86537761, -47.86537761, -46.08389818, -46.08389818,
-46.08389818, -46.08389818, -46.08389818, -40.66600449,
-40.66600449, -40.66600449, -40.66600449, -40.66600449,
-42.74159983, -42.74159983, -42.74159983, -42.74159983,
-42.74159983, -34.44249623, -34.44249623, -34.44249623,
-34.44249623, -34.44249623, -31.23243547, -31.23243547,
-31.23243547, -31.23243547, -31.23243547, -30.11961397,
-30.11961397, -30.11961397, -30.11961397, -30.11961397,
-22.88610359, -22.88610359, -22.88610359, -22.88610359,
-22.88610359, -22.66522498, -22.66522498, -22.66522498,
-22.66522498, -22.66522498, -22.8258467 , -22.8258467 ,
-22.8258467 , -22.8258467 , -22.8258467 , -25.37145157,
-25.37145157, -25.37145157, -25.37145157, -25.37145157,
-29.45671685, -29.45671685, -29.45671685, -29.45671685,
-29.45671685, -36.30382435, -36.30382435, -36.30382435,
-36.30382435, -36.30382435, -64.88047059, -64.88047059,
-64.88047059, -64.88047059, -64.88047059, -90.40117992,
-90.40117992, -90.40117992, -90.40117992, -90.40117992,
-117.85854224, -117.85854224, -117.85854224, -117.85854224,
-117.85854224, -146.65567437, -146.65567437, -146.65567437,
-146.65567437, -146.65567437, -175.4335766 , -175.4335766 ,
-175.4335766 , -175.4335766 , -175.4335766 , -213.1768593 ,
-213.1768593 , -213.1768593 , -213.1768593 , -213.1768593 ,
-246.36052771, -246.36052771, -246.36052771, -246.36052771,
-246.36052771, -254.07443916, -254.07443916, -254.07443916,
-254.07443916, -254.07443916, -263.73274134, -263.73274134,
-263.73274134, -263.73274134, -263.73274134, -258.03283753,
-258.03283753, -258.03283753, -258.03283753, -258.03283753,
-233.33633164, -233.33633164, -233.33633164, -233.33633164,
-233.33633164, -209.26667778, -209.26667778, -209.26667778,
-209.26667778, -209.26667778, -198.50333932, -198.50333932,
-198.50333932, -198.50333932, -198.50333932, -192.19746949,
-192.19746949, -192.19746949, -192.19746949, -192.19746949,
-182.44656671, -182.44656671, -182.44656671, -182.44656671,
-182.44656671, -164.21427251, -164.21427251, -164.21427251,
-164.21427251, -164.21427251, -143.79641255, -143.79641255,
-143.79641255, -143.79641255, -143.79641255, -140.67194607,
-140.67194607, -140.67194607, -140.67194607, -140.67194607,
-124.58322603, -124.58322603, -124.58322603, -124.58322603,
-124.58322603, -124.89753589, -124.89753589, -124.89753589,
-124.89753589, -124.89753589, -125.11225829, -125.11225829,
-125.11225829, -125.11225829, -125.11225829, -125.81703407,
-125.81703407, -125.81703407, -125.81703407, -125.81703407,
-106.96613932, -106.96613932, -106.96613932, -106.96613932,
-106.96613932, -83.17853528, -83.17853528, -83.17853528,
-83.17853528, -83.17853528, -69.41292893, -69.41292893,
-69.41292893, -69.41292893, -69.41292893, -54.03692269,
-54.03692269, -54.03692269, -54.03692269, -54.03692269,
-40.53106631, -40.53106631, -40.53106631, -40.53106631,
-40.53106631, -33.3205849 , -33.3205849 , -33.3205849 ,
-33.3205849 , -33.3205849 , -36.37400949, -36.37400949,
-36.37400949, -36.37400949, -36.37400949, -35.64897529,
-35.64897529, -35.64897529, -35.64897529, -35.64897529,
-29.56753594, -29.56753594, -29.56753594, -29.56753594,
-29.56753594, -29.09557814, -29.09557814, -29.09557814,
-29.09557814, -29.09557814, -27.99146529, -27.99146529,
-27.99146529, -27.99146529, -27.99146529, -26.58469763,
-26.58469763, -26.58469763, -26.58469763, -26.58469763,
-20.32485707, -20.32485707, -20.32485707, -20.32485707,
-20.32485707, -15.06863064, -15.06863064, -15.06863064,
-15.06863064, -15.06863064, -12.47770867, -12.47770867,
-12.47770867, -12.47770867, -12.47770867, -10.55085402,
-10.55085402, -10.55085402, -10.55085402, -10.55085402,
-7.17536086, -7.17536086, -7.17536086, -7.17536086,
-7.17536086, -4.83432784, -4.83432784, -4.83432784,
-4.83432784, -4.83432784, -4.13833286, -4.13833286,
-4.13833286, -4.13833286, -4.13833286, -1.81402187,
-1.81402187, -1.81402187, -1.81402187, -1.81402187])
[8]:
pd.concat(data, axis=1).plot()
[8]:
<Axes: >

[9]:
y["TDA anomalies"] = anomaly_score
y.plot(subplots=True)
[9]:
array([<Axes: >, <Axes: >], dtype=object)
