tadkit.catalog package

Submodules

tadkit.catalog.rawtowideformatter module

class tadkit.catalog.rawtowideformatter.RawToWideFormatter(data: DataFrame | ndarray, timestamps: Sequence | None = None, columns: Sequence[str] | None = None, backend: str = 'numpy')[source]

Bases: Formatter

A Formatter that supports both pandas DataFrame and NumPy array outputs.

Parameters:
  • data (pd.DataFrame or np.ndarray) – Input data.

  • backend (str) – β€˜pandas’ or β€˜numpy’.

  • timestamps (np.ndarray, optional) – Required if data is a NumPy array.

  • columns (list[str], optional) – Column names for NumPy arrays.

format(target_period=None, target_space=None, resample=False, resample_freq: float = 1.0)[source]

Slice and optionally resample the data, using backend-specific resampling.

tadkit.catalog.registry_init module

tadkit.catalog.sklearners module

class tadkit.catalog.sklearners.CustomScoreOutlierDetector(score_func: Callable[[ndarray], ndarray], contamination: float = 0.1)[source]

Bases: BaseDensityOutlierDetector

Parameters:
  • score_func (callable) – Function X -> scores (higher = inliers). Must accept 2D array and return 1D array.

  • contamination (float, default=0.1) – Proportion of outliers. Must be in (0, 0.5).

score_func: Callable[[ndarray], ndarray]
class tadkit.catalog.sklearners.GMMOutlierDetector(n_components=1, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1, init_params='kmeans', weights_init=None, means_init=None, precisions_init=None, random_state=None, warm_start=False, verbose=0, verbose_interval=10, contamination: float = 0.1)[source]

Bases: BaseDensityOutlierDetector

Density-based outlier detection using GaussianMixture.

Parameters:
  • n_components (int, default=1) – The number of mixture components.

  • covariance_type ({'full', 'tied', 'diag', 'spherical'}, default='full') – Type of covariance parameters to use.

  • tol (float, default=1e-3) – Convergence threshold.

  • reg_covar (float, default=1e-6) – Non-negative regularization added to the diagonal of covariance matrices.

  • max_iter (int, default=100) – The number of EM iterations to perform.

  • n_init (int, default=1) – The number of initializations to perform. The best result is kept.

  • init_params ({'kmeans', 'random'}, default='kmeans') – Method used to initialize the weights, means, and precisions.

  • weights_init (array-like of shape (n_components,), default=None) – The user-provided initial weights.

  • means_init (array-like of shape (n_components, n_features), default=None) – The user-provided initial means.

  • precisions_init (array-like, default=None) – The user-provided initial precisions.

  • random_state (int, RandomState instance, default=None) – Controls the random seed.

  • warm_start (bool, default=False) – If True, reuse the solution of the last fitting.

  • verbose (int, default=0) – Enable verbose output.

  • verbose_interval (int, default=10) – Number of iteration steps between printing progress.

  • contamination (float, default=0.1) – Proportion of outliers in the dataset.

class tadkit.catalog.sklearners.KDEOutlierDetector(bandwidth=1.0, algorithm='auto', kernel='gaussian', metric='euclidean', atol=0, rtol=0, breadth_first=True, leaf_size=40, metric_params=None, contamination: float = 0.1)[source]

Bases: BaseDensityOutlierDetector

Density-based outlier detection using KernelDensity.

Parameters:
  • bandwidth (float, default=1.0) – The bandwidth of the kernel.

  • algorithm ({'kd_tree', 'ball_tree', 'auto'}, default='auto') – The tree algorithm to use.

  • kernel (str, default='gaussian') – The kernel to use. Valid kernels are [β€˜gaussian’, β€˜tophat’, β€˜epanechnikov’, β€˜exponential’, β€˜linear’, β€˜cosine’].

  • metric (str, default='euclidean') – The distance metric to use.

  • atol (float, default=0) – The desired absolute tolerance of the result.

  • rtol (float, default=0) – The desired relative tolerance of the result.

  • breadth_first (bool, default=True) – If true, use a breadth-first approach to the problem.

  • leaf_size (int, default=40) – Leaf size passed to BallTree or KDTree.

  • metric_params (dict, default=None) – Additional parameters for the metric function.

  • contamination (float, default=0.1) – Proportion of outliers in the data set.

Module contents