Website Email & Phone Extractor — Python Example

Crawl any list of websites and pull every email address, phone number, LinkedIn, Twitter, Facebook, Instagram, YouTube, TikTok, and Telegram handle in Python — for lead generation, recruiting, journalism, or technical SEO contact discovery.

This Python project drives the Website Contact Scraper Apify actor — a focused crawler that visits a website's homepage plus its contact / about / team / press / careers pages (configurable depth and breadth), then pulls every email, phone, and social link the site reveals. Up to 100 starting domains per run, $1 per 1,000 results, no rate limit on your end.

What this does

Building an SDR / outreach pipeline always hits the same wall: you have a list of prospect domains, but their contact pages bury emails behind images, JavaScript, or info[at]example.com obfuscation. Generic regex scrapers miss the obfuscated ones; commercial enrichment APIs (Apollo, Hunter, Lusha) start at $99+/month per seat and gate the data behind a credits system. This actor instead just does the boring crawling work — visits a few key pages per domain, runs robust extraction regexes against deobfuscated HTML, and returns clean lists per domain.

Use cases

B2B outbound prospecting — convert a list of company domains into emails + phones for SDR outreach.
Recruiting & headhunting — pull contact info from career pages of target hire companies.
Journalism & PR — find press@ / media@ contacts for outlets you want to pitch.
Technical SEO outreach — broken-link building or guest-post pitching requires a real contact, not a contact form.
Influencer outreach — extract emails from creator websites linked in social media bios.
Sales territory enrichment — feed your CRM with phone numbers for cold-call follow-up.
Compliance & due diligence — verify that a domain has working contact info before signing a partnership.

Requirements

Python 3.10+
A free Apify account
No third-party enrichment account needed

Quick start

git clone https://github.com/pro100chok/website-email-phone-extractor-python.git
cd website-email-phone-extractor-python
pip install -r requirements.txt
cp .env.example .env
# paste your APIFY_API_TOKEN
python main.py

main.py enriches ten developer-tools websites (Linear, Airtable, Retool, Hex, Mintlify, Vercel, Fly.io, Supabase, PlanetScale, Dagster), picks the best generic email per domain (hello@/contact@/sales@ first), and saves to JSON + CSV.

How it works

For each startUrls entry the crawler:

Visits the URL and parses links matching contact / about / team / press / careers / support routes (multiple language variants).
Crawls those pages depth-first, up to maxDepth levels and maxPagesPerDomain pages total per domain.
Extracts emails from mailto: links, plain text (deobfuscating [at], (at), at, @), and structured data (Person/Organization JSON-LD blocks).
Extracts phone numbers from tel: links and free text using locale-aware regex (US, EU, intl formats).
Detects social handles for LinkedIn, X / Twitter, Facebook, Instagram, YouTube, TikTok, Telegram.

The output is one dataset record per starting domain. Multiple emails / phones / socials are returned as arrays so you can pick the one that matches your outreach style.

Example: prospect list enrichment

import os
from apify_client import ApifyClient

client = ApifyClient(os.environ["APIFY_API_TOKEN"])

run = client.actor("pro100chok/extract-emails").call(run_input={
    "startUrls": [
        {"url": "https://www.linear.app"},
        {"url": "https://www.airtable.com"},
    ],
    "maxDepth": 2,
    "maxPagesPerDomain": 15,
})

for it in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{it['domain']}: {it['emails'][:3]}  {it.get('phones', [])[:1]}")

Example output

{
  "url": "https://www.linear.app",
  "domain": "linear.app",
  "emails": ["hello@linear.app", "press@linear.app", "careers@linear.app"],
  "phones": [],
  "socials": {
    "twitter": "https://twitter.com/linear",
    "linkedin": "https://www.linkedin.com/company/linear-app/",
    "youtube": "https://www.youtube.com/@linear"
  },
  "pagesCrawled": 12
}

Input parameters

Parameter	Type	Required	Description
`startUrls`	object[]	yes	Up to 100 starting URLs. Accepts full URLs, bare domains, with or without `www`/protocol.
`maxDepth`	integer	no	Crawl depth from each start page. `0` = start page only. Default `2`.
`maxPagesPerDomain`	integer	no	Max pages crawled per domain. Default `10`.
`concurrency`	integer	no	Number of domains processed in parallel. Default `10`.
`useProxy`	boolean	no	Enable proxy. Off by default — most public sites don't need it.
`proxyConfiguration`	object	no	Used only when `useProxy=true`. Supports Apify Proxy or custom URLs.

More examples

File	Demonstrates
`examples/01_basic_usage.py`	Single-website crawl.
`examples/02_from_search_results.py`	Multi-domain bulk crawl.
`examples/03_role_email_filter.py`	Filter to role inboxes (`sales@`, `support@`, etc.).
`examples/04_export_to_csv.py`	Pandas export + best-contact-per-domain logic.
`examples/05_export_to_google_sheets.py`	Append fresh enrichment to a shared Sheet.

FAQ

How much does it cost? $1 per 1,000 domains crawled (each domain = one result item). Apify's free $5/month covers ~5,000 enrichments before you pay anything.

Will it find every email on the site? It finds every email present in HTML, deobfuscated forms ([at], (at), HTML entities), mailto: links, and JSON-LD schemas across the homepage + a small set of high-value subpages. Emails buried in images, PDFs, or third-party form widgets won't be extracted — those usually require either OCR or an expensive headless-browser pass.

What's maxPagesPerDomain for? Most sites have all useful contact info reachable from the homepage and one click away (contact / about / team / press). Setting maxPagesPerDomain: 15 lets you reach those routes without blowing through credits crawling every blog post.

Is this GDPR-compliant? The actor extracts publicly-published contact information from public websites. Whether it's legal to use those contacts for cold outreach depends on your jurisdiction (B2B is generally fine under GDPR's legitimate-interest basis if you offer opt-out; consumer/personal emails are stricter). Consult your legal team before bulk outreach in EU markets.

Can I run this with proxies? Yes. Set useProxy: true and pass proxyConfiguration. Most public marketing sites don't need proxies, but Cloudflare-protected sites occasionally do.

Can I extract emails from a single page only? Yes. Set maxDepth: 0 — the actor will only crawl the URL you give it, no link discovery.

Does it work with subdomains? Yes. Pass https://subdomain.example.com and the crawler will crawl that subdomain (not the parent domain) up to maxPagesPerDomain.

What social platforms does it detect? LinkedIn, X / Twitter, Facebook, Instagram, YouTube, TikTok, Telegram, GitHub, Discord, Crunchbase, AngelList — anywhere a recognizable canonical URL appears on the page.

Can I verify the emails I get? Yes. Pipe the output through the Email Verifier actor to check deliverability and risk scores before you send your outreach.

Related actors

Email Verifier — Bulk Email Validation — validate the emails this actor produces.
Google Maps Scraper — Emails, Reviews & Photos — local-business contact extraction.
Clutch.co Scraper — agency contact directory enriched with emails.

See all my actors at apify.com/pro100chok.

License

MIT — see LICENSE.

Built on top of the Website Contact Scraper Apify actor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Website Email & Phone Extractor — Python Example

What this does

Use cases

Requirements

Quick start

How it works

Example: prospect list enrichment

Example output

Input parameters

More examples

FAQ

Related actors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Website Email & Phone Extractor — Python Example

What this does

Use cases

Requirements

Quick start

How it works

Example: prospect list enrichment

Example output

Input parameters

More examples

FAQ

Related actors

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages