arche.data_quality_report module

class arche.data_quality_report.DataQualityReport(items: arche.readers.items.Items, schema: Dict[str, Dict[str, Union[str, bool, int, float, None, List[T]]]], report: arche.report.Report, bucket: Optional[str] = None)

Bases: object

coverage_by_categories(df, tagged_fields)

Makes tables which show the number of items per category, set up with a category tag

Parameters
  • df – a dataframe of items

  • tagged_fields – a dict of tags

create_appendix(schema)
create_figures(items, items_dicts)
drop_service_columns(df)
job_summary_table(job)
plot_html_to_stream()
plot_to_notebook()
rules_summary_table(df, no_of_validation_warnings, name_field, url_field, no_of_checked_duplicated_items, no_of_duplicated_items, unique, no_of_checked_skus, no_of_duplicated_skus, price_field, price_was_field, no_of_checked_price_items, no_of_price_warns, **kwargs)
save_report_to_bucket(project_id, spider, bucket)
score_table(quality_estimation, field_accuracy)
scraped_fields_coverage(job, df)
scraped_items_history(job_no, job_numbers, date_items)