🍊 Completeness
Data Completeness Evaluation Description
Overview
This metric provides a set of tools for assessing the completeness of tabular data. It is particularly useful during the data preprocessing and cleaning stages of a data analysis workflow.
Components
metric.py (Module)
Description
This module includes the DataCompleteness class, which contains methods to calculate completeness scores for dataframes and individual columns. These methods are instrumental in identifying columns with missing data and quantifying the extent of missingness.
Class: DataCompleteness
Methods
completeness_tabular(data: pd.DataFrame) -> float: Calculates the average completeness score for a dataframe.
data_completion(data: pd.Series) -> float: Calculates the completeness score for an individual data column.
Example
from dqm.completeness.metric import DataCompleteness
import pandas as pd
def main():
completeness_evaluator = DataCompleteness()
df = pd.read_csv('your_data.csv', sep=";")
overall_score = completeness_evaluator.completeness_tabular(df)
column_score = completeness_evaluator.data_completion(df['your_column'])
print(f'Overall Data Completeness Score: {overall_score}')
print(f'Completeness Score for Column: {column_score}')
if __name__ == "__main__":
main()