arche.rules.coverage module¶
-
arche.rules.coverage.
check_fields_coverage
(df: pandas.core.frame.DataFrame) → arche.rules.result.Result¶ Get fields coverage from df. Coverage reflects the percentage of real values (exluding nan) per column.
- Parameters
df – a data to count the coverage
- Returns
A result with coverage for all columns in provided df. If column contains only nan, treat it as an error.
-
arche.rules.coverage.
compare_scraped_fields
(source_df: pandas.core.frame.DataFrame, target_df: pandas.core.frame.DataFrame) → arche.rules.result.Result¶ Find new or missing columns between source_df and target_df
-
arche.rules.coverage.
get_difference
(source_job: scrapinghub.client.jobs.Job, target_job: scrapinghub.client.jobs.Job) → arche.rules.result.Result¶ Get difference between jobs coverages. The coverage is job fields counts divided on the job size.
- Parameters
source_job – a base job, the difference is calculated from it
target_job – a job to compare
- Returns
A Result instance with huge dif and stats with fields counts coverage and dif