dqm.completeness package๏ƒ

Submodules๏ƒ

dqm.completeness.main module๏ƒ

dqm.completeness.metric module๏ƒ

Data Completeness Evaluation Module

This module provides tools to assess the completeness of tabular data. It is especially useful in data preprocessing and cleaning stages of a data analysis workflow. The module includes a class, DataCompleteness, with methods to calculate completeness scores for dataframes and individual columns. These methods help in identifying columns with missing data and quantifying the extent of missingness.

Authors:

Faouzi ADJED Anani DJATO

Classes:

DataCompleteness: A class that encapsulates the methods for evaluating data completeness.

dqm.completeness.metric.completeness_tabular()๏ƒ

Calculates the average completeness score for a dataframe.

dqm.completeness.metric.data_completion()๏ƒ

Calculates the completeness score for an individual data column.

Dependencies:

numpy pandas matplotlib scipy seaborn warnings

Usage:

The DataCompleteness class can be used as follows:

from data_completeness import DataCompleteness

# Create an instance of the class completeness_evaluator = DataCompleteness()

# Load your data into a pandas DataFrame df = pd.read_csv(โ€˜your_data_path.csvโ€™)

# Calculate the overall completeness score for the DataFrame overall_score = completeness_evaluator.completeness_tabular(df)

# Calculate the completeness score for a single column column_score = completeness_evaluator.data_completion(df[โ€˜your_columnโ€™])

# Print the results print(fโ€™Overall Data Completeness Score: {overall_score}โ€™) print(fโ€™Completeness Score for Column: {column_score}โ€™)

class dqm.completeness.metric.DataCompleteness[source]๏ƒ

Bases: object

This class provides methods to evaluate the completeness of tabular data.

It includes methods to calculate completeness scores for individual columns and for entire dataframes by assessing the presence of non-null data.

completeness_tabular()[source]๏ƒ

Calculate the average completeness score of a dataframe.

data_completion()[source]๏ƒ

Calculate the completeness score of a single data column.

completeness_tabular(data)[source]๏ƒ

Calculate the average completeness score of the entire dataframe.

Parameters:

data (pd.DataFrame) โ€“ The dataframe to be evaluated for completeness.

Returns:

The average completeness score of

all columns in the dataframe.

Return type:

score_total(float)

data_completion(data)[source]๏ƒ

Calculate the completeness score of a single data column.

Parameters:

data (pd.Series) โ€“ The data column to be evaluated for completeness.

Returns:

The completeness score of the column,

calculated as the ratio of non-null entries to total entries.

Return type:

completeness_score(float)

Module contents๏ƒ