In Short

[1]:
%cd ../../../src
/Users/valery/Documents/_code/arche/src
[2]:
%load_ext autoreload
%autoreload 2
[3]:
from arche import *
[4]:
schema = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "definitions": {
        "float": {
            "pattern": "^-?[0-9]+\\.[0-9]{2}$"
        },
        "url": {
            "pattern": "^https?://(www\\.)?[a-z0-9.-]*\\.[a-z]{2,}([^<>%\\x20\\x00-\\x1f\\x7F]|%[0-9a-fA-F]{2})*$"
        }
    },
    "additionalProperties": False,
    "type": "object",
    "properties": {
        "category": {"type": "string", "tag": ["category"]},
        "price": {"type": "string", "pattern": "^£\d{2}.\d{2}$"},
        "_type": {"type": "string"},
        "description": {"type": "string"},
        "title": {"type": "string"},
        "_key": {"type": "string"}
    },
    "required": [
        "_key",
        "_type",
        "category",
        "description",
        "price",
        "title"
    ]
}
[5]:
a = Arche("381798/1/2", schema=schema, target="381798/1/1")
[6]:
a.source_items.df.head()

[6]:
_key _type category description price title
0 https://app.scrapinghub.com/p/381798/1/2/item/0 dict Young Adult Patient Twenty-nine.A monster roams the halls ... £22.65 The Requiem Red
1 https://app.scrapinghub.com/p/381798/1/2/item/1 dict History From a renowned historian comes a groundbreaki... £54.23 Sapiens: A Brief History of Humankind
2 https://app.scrapinghub.com/p/381798/1/2/item/2 dict Mystery WICKED above her hipbone, GIRL across her hear... £47.82 Sharp Objects
3 https://app.scrapinghub.com/p/381798/1/2/item/3 dict Fiction Dans une France assez proche de la nôtre, un h... £50.10 Soumission
4 https://app.scrapinghub.com/p/381798/1/2/item/4 dict Historical Fiction "Erotic and absorbing...Written with starling ... £53.74 Tipping the Velvet
[9]:
a.report_all()


Job Outcome:
        Finished

Job Errors:
        No errors

Responses Per Item Ratio:
        Number of responses / Number of scraped items - 1.05

Total Scraped Items:
        Same number of items

Compare Runtime:
        Similar or better runtime - 0:00:49.589000 and 0:00:55.089000

Finish Time:
        Less than 1 day difference

Fields Coverage:
        PASSED

Boolean Fields:
        SKIPPED

JSON Schema Validation:
        1000 items were checked, 1 error(s)

Tags:
        Used - category
        Not used - name_field, product_price_field, product_price_was_field, product_url_field, unique

Compare Price Was And Now:
        product_price_field or product_price_was_field tags were not found in schema

Uniqueness:
        'unique' tag was not found in schema

Duplicated Items:
        'name_field' and 'product_url_field' tags were not found in schema

Coverage For Scraped Categories:
        50 categories in 'category'

Compare Prices For Same Urls:
        product_url_field tag is not set

Compare Names Per Url:
        product_url_field tag is not set

Compare Prices For Same Names:
        name_field tag is not set




Coverage Difference (1 message(s)):

Fields Coverage (1 message(s)):

JSON Schema Validation (1 message(s)):
2 items affected - description is not of type 'string': 459 982


Coverage For Scraped Categories (1 message(s)):

Category Coverage Difference (1 message(s)):
[8]:
find_duplicates_by(a.source_items.df, ["title", "price"]).show()
[ ]:

[ ]: