arche package¶
-
arche.
basic_json_schema
(data_source: str, items_numbers: List[int] = None)¶ Prints a json schema based on the provided job_key and item numbers
- Parameters
data_source – a collection or job key
items_numbers – array of item numbers to create schema from
-
class
arche.
Arche
(source: str, schema: Union[str, Dict[str, Dict[str, Union[str, bool, int, float, None, List[T]]]], None] = None, target: Optional[str] = None, start: int = 0, count: Optional[int] = None, filters: Optional[List[Tuple[str, str, str]]] = None, expand: bool = True)¶ Bases:
object
-
basic_json_schema
(items_numbers: List[int] = None)¶ Prints a json schema based on data from self.source
- Parameters
items_numbers – array of item numbers to create a schema from
-
check_metadata
¶
-
compare_metadata
¶
-
compare_with_customized_rules
(source_items, target_items, tagged_fields)¶
-
data_quality_report
(bucket: Optional[str] = None)¶
-
static
get_items
(source: str, start: int, count: Optional[int], filters: Optional[List[Tuple[str, str, str]]], expand: bool) → Union[arche.readers.items.JobItems, arche.readers.items.CollectionItems]¶
-
glance
()¶ Run JSON schema check and output results. In most cases it will stop after the first error per item. Usable for big jobs as it’s about 100x faster than validate_with_json_schema().
-
report_all
()¶
-
run_all_rules
()¶
-
run_comparison_rules
¶
-
run_customized_rules
(items, tagged_fields)¶
-
run_general_rules
¶
-
run_schema_rules
()¶
-
save_result
(rule_result)¶
-
schema
¶
-
source_items
¶
-
target_items
¶
-
validate_with_json_schema
()¶ Run JSON schema check and output results. It will try to find all errors, but there are no guarantees. Slower than check_with_json_schema()
-
-
arche.
find_duplicates_by
(df: pandas.core.frame.DataFrame, columns: List[str]) → arche.rules.result.Result¶ Compare items rows in df by columns
- Returns
Any duplicates
-
class
arche.
CollectionItems
(key: str, count: Optional[int] = None, filters: Optional[List[Tuple[str, str, str]]] = None, expand: bool = True)¶ Bases:
arche.readers.items.Items
-
count
¶ The number of items users wants to retrieve
-
fetch_data
()¶
-
limit
¶ The maximum number of items in source
-