Welcome to dqm-ml ‘s documentation

ConfianceAI Logo

dqm-ml

Data Quality Metrics

The current version of the Data Quality Metrics (called dqm-ml) computes three data inherent metrics and one data-model dependent metric.

The data inherent metrics are

  • Diversity : Computes the presence in the dataset of all required information defined in the specification (requirements, Operational Design Domain (ODD) … ).

  • Representativeness : is defined as the conformity of the distribution of the key characteristics of the dataset according to a specification (requirements, ODD.. . )

  • Completeness : is defined by the degree to which subject data associated with an entity has values for all expected attributes and related entity instances in a specific context of use.

The data-model dependent metrics are:

  • Domain Gap : In the context of a computer vision task, the Domain Gap (DG) refers to the difference in semantic, textures and shapes between two distributions of images and it can lead to poor performances when a model is trained on a given distribution and then is applied to another one.

(Definitions from Confiance.ai program)

For each metric, several approaches are developped to handle the maximum of data types. For more technical and scientific details, please refer to this deliverable

Project description

Several approches are developped as described in the figure below.

In the current version, the available metrics are:

  • Representativeness:

    • $\chi^2$ Goodness of fit test for Uniform and Normal Distributions

    • Kolmogorov Smirnov test for Uniform and Normal Distributions

    • Granular and Relative Theorithecal Entropy GRTE proposed and developed in the Confiance.ai Research Program

  • Diversity:

    • Relative Diversity developed and implemented in Confiance.ai Research Program

    • Gini-Simpson and Simposon indices

  • Completeness:

    • Ratio of filled information

  • Domain Gap:

    • MMD

    • CMD

    • Wasserstein

    • H-Divergence

    • FID

    • Kullback-Leiblur MultiVariate Normal Distribution

Documentation structure

To know how to get started with dqm, see the 💡 Guideline section.

If you want more informations about how to use implemented metrics see the corresponding sections: 🍎 Diversity. 🍋 Domain gap. 🍓 Representativeness. 🍊 Completeness.