Compare jobs

[10]:
from arche import *
[5]:
a = Arche(source="235801/1/15", target="235801/1/14")

Let’s use the schema from Basics

[8]:
a.schema = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "definitions": {
        "float": {
            "pattern": "^-?[0-9]+\\.[0-9]{2}$"
        },
        "url": {
            "pattern": "^https?://(www\\.)?[a-z0-9.-]*\\.[a-z]{2,}([^<>%\\x20\\x00-\\x1f\\x7F]|%[0-9a-fA-F]{2})*$"
        }
    },
    "additionalProperties": False,
    "type": "object",
    "properties": {
        "category": {"type": "string", "tag": ["category"]},
        "price": {"type": "string", "pattern": "^£\d{2}.\d{2}$"},
        "_type": {"type": "string"},
        "description": {"type": "string"},
        "title": {"type": "string", "tag": ["unique"]},
        "_key": {"type": "string"}
    },
    "required": [
        "_key",
        "_type",
        "category",
        "description",
        "price",
        "title"
    ]
}
[9]:
a.report_all()


Job Outcome:
        Finished

Job Errors:
        No errors

Responses Per Item Ratio:
        Number of responses / Number of scraped items - 1.05

Total Scraped Items:
        Same number of items

Compare Runtime:
        Similar or better runtime - 0:04:15.526000 and 0:04:46.466000

Finish Time:
   16 day(s) difference between 2 jobs

Fields Coverage:
        0 totally empty field(s)

Boolean Fields:
        No fields to compare

JSON Schema Validation:
      1000 items were checked, 1 error(s)

Tags:
        category, unique

Compare Price Was And Now:
        product_price_field or product_price_was_field tags were not found in schema

Uniqueness:
      'title' contains 1 duplicated value(s)

Duplicated Items:
        'name_field' and 'product_url_field' tags were not found in schema

Coverage For Scraped Categories:
        50 categories in 'category'

Compare Scraped Categories:
        Similar coverage per category with 10% tolerance
        Same categories: 50; new categories: 0; missing categories: 0

Compare Prices For Same Urls:
        product_url_field tag is not set

Compare Names Per Url:
        product_url_field tag is not set

Compare Prices For Same Names:
        name_field tag is not set




RULE: Fields Coverage
(1 message(s))

             Values Count  Percent
Field
description           998       99
_key                 1000      100
_type                1000      100
category             1000      100
price                1000      100
title                1000      100

RULE: JSON Schema Validation
(1 message(s))

2 items affected - description is not of type 'string': 162 980


RULE: Uniqueness
(1 message(s))

2 items affected - same 'The Star-Touched Queen' title: 220 396


RULE: Coverage For Scraped Categories
(1 message(s))

[ ]: