Releases: svdC1/scrape-do-python
v0.2.0
Added
-
ScrapeDoProxyClientandAsyncScrapeDoProxyClient— route requests through Scrape.do's Proxy Mode (proxy.scrape.do:8080). Same request/response surface as the API-mode clients (execute/request/get/post), minusexecute_from_url(no equivalent in proxy mode). The async variant is backed byhttpx.AsyncClientand usesasyncio.sleepfor retry pauses. -
Per-(
api_token, parameters)httpx.Client/httpx.AsyncClientpool with bounded LRU eviction (max_pooled_clients=16default, configurable). Two requests with the same parameters reuse the same TCP / TLS / HTTP-2 connection; the cookie jar on each pooled client is cleared after every request (Scrape.do owns the cookie lifecycle viasetCookies/scrape.do-cookies/sessionId, so pooling is purely a transport concern). -
PreparedScrapeDoRequest.to_proxy_httpx_kwargs()— serializes the same data model into httpx kwargs that target the destination URL directly (the API token and Scrape.do parameters live in the proxy URL's userinfo segment, not the request). -
RequestParameters.to_proxy_url()— generates aScrape.doProxy-Mode connection string template (http://{api_token}:<params>@proxy.scrape.do:8080) for use with the proxy clients or with third-party tooling (Playwright / Selenium / curl). -
RequestParameters.validate_proxy_params()— cross-validates Proxy-Mode-specific parameter quirks (customHeadersdefaulting to true server-side,setCookiesinteraction, render-mode discouragement). -
SCRAPE_DO_CA_PATHandDEFAULT_PROXY_SSL_CONTEXTinscrape_do.constants— the bundled Scrape.do CA cert and anssl.SSLContextpreloaded with system CAs plus the bundled CA. Defaultverifysource for the proxy-mode clients so HTTPS targets validate correctly through Scrape.do's MITM step without disabling TLS verification.VERIFY_X509_STRICTis cleared so chain validation accepts Scrape.do's self-signed root (which omits the optional AKI extension); all other verification checks remain intact. -
Scrape.do's CA certificate bundled with the wheel under
scrape_do.dataso the SDK ships everything needed for proxy-mode TLS verification. -
Public re-exports for
ScrapeDoProxyClientandAsyncScrapeDoProxyClientinscrape_do/__init__.py. -
AsyncScrapeDoClientbacked byhttpx.AsyncClient. Near-1:1 of the synchronous client (smart routing, retry strategy, session validation, event hooks), with every IO-bound methodasync/await. Sleeps between retries useasyncio.sleeprather thantime.sleep. -
AsyncClientEventHooksTypedDict andAsyncSessionValidatortype alias. Both are async-only — hooks returnAwaitable[None]and validators returnAwaitable[bool], so they can perform I/O while the request executes. -
Public re-exports for
AsyncScrapeDoClient,AsyncClientEventHooks, andAsyncSessionValidatorinscrape_do/__init__.py. -
ScrapeDoResponse.json(raw_response=True, **kwargs) -> Anyconvenience method. Withraw_response=True(default) it shortcuts tohttpx_response.json(); withraw_response=Falseit returnsjson.loads(self.text, **kwargs)so the post-envelope path is reachable without manual parsing. -
Example block in the package-level docstring at
src/scrape_do/__init__.pyshowcasing a typical request flow.
Fixed
ScrapeDoClient.post()now forwards thesession_validatorargument torequest(). Previously the argument was accepted but silently ignored on POST calls.get()was unaffected.
v0.1.1
Added
-
Curated public re-exports in
scrape_do/__init__.pyso common imports work asfrom scrape_do import ScrapeDoClient, RequestParameters, ...rather than digging into submodules. -
py.typedPEP 561 marker so downstream type-checkers (mypy,pyright) consume the package's type hints. -
Trove classifiers in package metadata — PyPI's "Python" sidebar and shields.io's
pypi/pyversionsbadge now populate correctly.
Removed
- Empty
scrape_do/namespaces/placeholder folder (was scaffolding from before the roadmap solidified; will be replaced byplugins/in0.4+).
Documentation
- Planned package layout added to
ROADMAP.
v0.1.0
Initial release. Synchronous client surface.
Added
-
ScrapeDoClientsynchronous client withrequest(),get(),post(),execute(), andexecute_from_url()methods. -
Smart routing in
ScrapeDoClient.request(): accepts kwargs, a pre-builtRequestParameters, or a rawapi.scrape.doURL — exactly one configuration shape per call. -
Automatic retries on Scrape.do gateway errors (429 / 502 / 510) with configurable backoff strategy (static float or callable). Default is jittered exponential.
-
session_validatorcallback (SyncSessionValidator) for sticky-session rotation detection — when present andsession_idis set, the validator decides whether to raiseRotatedSessionError. -
SDK-native event hooks via
SyncClientEventHooksTypedDict:request/response/retrylifecycle, distinct from httpx transport-level hooks. -
Pydantic-validated
RequestParameterscovering the full Scrape.do API parameter surface, including browser-action models (ClickAction,WaitAction,FillAction,ExecuteAction,ScreenShotAction, scrolling, request-completion waits). -
ScrapeDoResponsewrapper exposing the parsed JSON envelope, network requests, websocket frames, action results, screenshots, frames, plus a rawstatus_codepassthrough. -
Cookie isolation between sequential requests on the underlying
httpx.Client(prevents cross-request bleed). -
Exception hierarchy:
ScrapeDoError(base),APIConnectionError,TargetError,RotatedSessionError, plus the API-layerAuthenticationError,BadRequestError,RateLimitError,ServerError, andAuthenticationThrottleError. -
Default request timeout raised to 60 seconds (from httpx's 5s default) to accommodate browser rendering and proxy round-trips.