arche.rules.category module¶
-
arche.rules.category.
get_coverage_per_category
(df: pandas.core.frame.DataFrame, category_names: List[T]) → arche.rules.result.Result¶ Get value counts per column, excluding nan.
- Parameters
df – a source data to assess
category_names – list of columns which values counts to see
- Returns
Number of categories per field, value counts series for each field.
-
arche.rules.category.
get_difference
(source_key: str, target_key: str, source_df: pandas.core.frame.DataFrame, target_df: pandas.core.frame.DataFrame, category_names: List[str]) → arche.rules.result.Result¶ Find and show differences between categories coverage, including nan values. Coverage means value counts divided on total size.
- Parameters
source_key – name of data you want to compare
target_key – name of data you want to compare source with
source_df – a data you want to compare
target_df – a data you want to compare with
category_names – list of columns which values to compare
- Returns
A result instance with messages containing significant difference defined by thresholds, a dataframe showing all normalized value counts in percents, a series containing significant difference.