arche.tools.api module

arche.tools.api.get_batch_id(job)
arche.tools.api.get_collection(key)
arche.tools.api.get_crawlera_user(job)
arche.tools.api.get_errors_count(job)
arche.tools.api.get_finish_time_difference_in_days(job1, job2)
arche.tools.api.get_items(key: str, count: int, start_index: int, filters: Optional[List[Tuple[str, str, str]]] = None, p_bar: Union[tqdm._tqdm.tqdm, tqdm.tqdm_notebook] = <function tqdm_notebook>) → List[Dict[str, Any]]
arche.tools.api.get_items_count(job)
arche.tools.api.get_items_with_pool(source_key: str, count: int, start_index: int = 0, workers: int = 4) → List[Dict[str, Any]]

Concurrently reads items from API using Pool

Parameters
  • source_key – a job or collection key, e.g. ‘112358/13/21’

  • count – a number of items to retrieve

  • start_index – an index to read from

  • workers – the number of separate processors to get data in

Returns

A list of items

arche.tools.api.get_job_arguments(job)
arche.tools.api.get_job_close_reason(job)
arche.tools.api.get_job_state(job)
arche.tools.api.get_keywords(job)
arche.tools.api.get_max_memusage(job)
arche.tools.api.get_requests_count(job)
arche.tools.api.get_response_status_count(job)
arche.tools.api.get_runtime(job)

Returns the runtime in milliseconds or None if job is still running

arche.tools.api.get_runtime_s(job)

Returns job runtime in milliseconds.

arche.tools.api.get_scraped_fields(job)
arche.tools.api.get_source(source_key)
arche.tools.api.get_spider_name(job_key)
arche.tools.api.get_store_details(job)
arche.tools.api.get_store_id(job)
arche.tools.api.get_warnings(job, level)
arche.tools.api.key_to_url(key: str, source_key: str) → str

Get full Scrapy Cloud url from _key E.g. 112358/13/21/0 to https://app.scrapinghub.com/ p/112358/13/21/item/0

Parameters
  • key – a meta _key

  • source_key – a job or collection key

Returns

A full url to an item in a Scrapy Cloud