v0.2.0 #3
svdC1
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Added
ScrapeDoProxyClientandAsyncScrapeDoProxyClient— route requests through Scrape.do's Proxy Mode (proxy.scrape.do:8080). Same request/response surface as the API-mode clients (execute/request/get/post), minusexecute_from_url(no equivalent in proxy mode). The async variant is backed byhttpx.AsyncClientand usesasyncio.sleepfor retry pauses.Per-(
api_token, parameters)httpx.Client/httpx.AsyncClientpool with bounded LRU eviction (max_pooled_clients=16default, configurable). Two requests with the same parameters reuse the same TCP / TLS / HTTP-2 connection; the cookie jar on each pooled client is cleared after every request (Scrape.do owns the cookie lifecycle viasetCookies/scrape.do-cookies/sessionId, so pooling is purely a transport concern).PreparedScrapeDoRequest.to_proxy_httpx_kwargs()— serializes the same data model into httpx kwargs that target the destination URL directly (the API token and Scrape.do parameters live in the proxy URL's userinfo segment, not the request).RequestParameters.to_proxy_url()— generates aScrape.doProxy-Mode connection string template (http://{api_token}:<params>@proxy.scrape.do:8080) for use with the proxy clients or with third-party tooling (Playwright / Selenium / curl).RequestParameters.validate_proxy_params()— cross-validates Proxy-Mode-specific parameter quirks (customHeadersdefaulting to true server-side,setCookiesinteraction, render-mode discouragement).SCRAPE_DO_CA_PATHandDEFAULT_PROXY_SSL_CONTEXTinscrape_do.constants— the bundled Scrape.do CA cert and anssl.SSLContextpreloaded with system CAs plus the bundled CA. Defaultverifysource for the proxy-mode clients so HTTPS targets validate correctly through Scrape.do's MITM step without disabling TLS verification.VERIFY_X509_STRICTis cleared so chain validation accepts Scrape.do's self-signed root (which omits the optional AKI extension); all other verification checks remain intact.Scrape.do's CA certificate bundled with the wheel under
scrape_do.dataso the SDK ships everything needed for proxy-mode TLS verification.Public re-exports for
ScrapeDoProxyClientandAsyncScrapeDoProxyClientinscrape_do/__init__.py.AsyncScrapeDoClientbacked byhttpx.AsyncClient. Near-1:1 of the synchronous client (smart routing, retry strategy, session validation, event hooks), with every IO-bound methodasync/await. Sleeps between retries useasyncio.sleeprather thantime.sleep.AsyncClientEventHooksTypedDict andAsyncSessionValidatortype alias. Both are async-only — hooks returnAwaitable[None]and validators returnAwaitable[bool], so they can perform I/O while the request executes.Public re-exports for
AsyncScrapeDoClient,AsyncClientEventHooks, andAsyncSessionValidatorinscrape_do/__init__.py.ScrapeDoResponse.json(raw_response=True, **kwargs) -> Anyconvenience method. Withraw_response=True(default) it shortcuts tohttpx_response.json(); withraw_response=Falseit returnsjson.loads(self.text, **kwargs)so the post-envelope path is reachable without manual parsing.Example block in the package-level docstring at
src/scrape_do/__init__.pyshowcasing a typical request flow.Fixed
ScrapeDoClient.post()now forwards thesession_validatorargument torequest(). Previously the argument was accepted but silently ignored on POST calls.get()was unaffected.This discussion was created from the release v0.2.0.
Beta Was this translation helpful? Give feedback.
All reactions