Compare jobs¶
[10]:
from arche import *
[5]:
a = Arche(source="235801/1/15", target="235801/1/14")
Let’s use the schema from Basics
[8]:
a.schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"float": {
"pattern": "^-?[0-9]+\\.[0-9]{2}$"
},
"url": {
"pattern": "^https?://(www\\.)?[a-z0-9.-]*\\.[a-z]{2,}([^<>%\\x20\\x00-\\x1f\\x7F]|%[0-9a-fA-F]{2})*$"
}
},
"additionalProperties": False,
"type": "object",
"properties": {
"category": {"type": "string", "tag": ["category"]},
"price": {"type": "string", "pattern": "^£\d{2}.\d{2}$"},
"_type": {"type": "string"},
"description": {"type": "string"},
"title": {"type": "string", "tag": ["unique"]},
"_key": {"type": "string"}
},
"required": [
"_key",
"_type",
"category",
"description",
"price",
"title"
]
}
[9]:
a.report_all()
Job Outcome:
Finished
Job Errors:
No errors
Responses Per Item Ratio:
Number of responses / Number of scraped items - 1.05
Total Scraped Items:
Same number of items
Compare Runtime:
Similar or better runtime - 0:04:15.526000 and 0:04:46.466000
Finish Time:
16 day(s) difference between 2 jobs
Fields Coverage:
0 totally empty field(s)
Boolean Fields:
No fields to compare
JSON Schema Validation:
1000 items were checked, 1 error(s)
Tags:
category, unique
Compare Price Was And Now:
product_price_field or product_price_was_field tags were not found in schema
Uniqueness:
'title' contains 1 duplicated value(s)
Duplicated Items:
'name_field' and 'product_url_field' tags were not found in schema
Coverage For Scraped Categories:
50 categories in 'category'
Compare Scraped Categories:
Similar coverage per category with 10% tolerance
Same categories: 50; new categories: 0; missing categories: 0
Compare Prices For Same Urls:
product_url_field tag is not set
Compare Names Per Url:
product_url_field tag is not set
Compare Prices For Same Names:
name_field tag is not set
RULE: Fields Coverage
(1 message(s))
Values Count Percent
Field
description 998 99
_key 1000 100
_type 1000 100
category 1000 100
price 1000 100
title 1000 100
RULE: JSON Schema Validation
(1 message(s))
RULE: Uniqueness
(1 message(s))
RULE: Coverage For Scraped Categories
(1 message(s))
[ ]: