arche.tools.api module¶
-
arche.tools.api.
get_batch_id
(job)¶
-
arche.tools.api.
get_collection
(key)¶
-
arche.tools.api.
get_crawlera_user
(job)¶
-
arche.tools.api.
get_errors_count
(job)¶
-
arche.tools.api.
get_finish_time_difference_in_days
(job1, job2)¶
-
arche.tools.api.
get_items
(key: str, count: int, start_index: int, filters: Optional[List[Tuple[str, str, str]]] = None, p_bar: Union[tqdm._tqdm.tqdm, tqdm.tqdm_notebook] = <function tqdm_notebook>) → List[Dict[str, Any]]¶
-
arche.tools.api.
get_items_count
(job)¶
-
arche.tools.api.
get_items_with_pool
(source_key: str, count: int, start_index: int = 0, workers: int = 4) → List[Dict[str, Any]]¶ Concurrently reads items from API using Pool
- Parameters
source_key – a job or collection key, e.g. ‘112358/13/21’
count – a number of items to retrieve
start_index – an index to read from
workers – the number of separate processors to get data in
- Returns
A list of items
-
arche.tools.api.
get_job_arguments
(job)¶
-
arche.tools.api.
get_job_close_reason
(job)¶
-
arche.tools.api.
get_job_state
(job)¶
-
arche.tools.api.
get_keywords
(job)¶
-
arche.tools.api.
get_max_memusage
(job)¶
-
arche.tools.api.
get_requests_count
(job)¶
-
arche.tools.api.
get_response_status_count
(job)¶
-
arche.tools.api.
get_runtime
(job)¶ Returns the runtime in milliseconds or None if job is still running
-
arche.tools.api.
get_runtime_s
(job)¶ Returns job runtime in milliseconds.
-
arche.tools.api.
get_scraped_fields
(job)¶
-
arche.tools.api.
get_source
(source_key)¶
-
arche.tools.api.
get_spider_name
(job_key)¶
-
arche.tools.api.
get_store_details
(job)¶
-
arche.tools.api.
get_store_id
(job)¶
-
arche.tools.api.
get_warnings
(job, level)¶
-
arche.tools.api.
key_to_url
(key: str, source_key: str) → str¶ Get full Scrapy Cloud url from _key E.g. 112358/13/21/0 to https://app.scrapinghub.com/ p/112358/13/21/item/0
- Parameters
key – a meta _key
source_key – a job or collection key
- Returns
A full url to an item in a Scrapy Cloud