Skip to content

improvement(seo): restore explicit AI/search bot allow-list and add link-preview rules#4480

Merged
waleedlatif1 merged 6 commits intostagingfrom
waleedlatif1/sofia-v3
May 6, 2026
Merged

improvement(seo): restore explicit AI/search bot allow-list and add link-preview rules#4480
waleedlatif1 merged 6 commits intostagingfrom
waleedlatif1/sofia-v3

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@waleedlatif1 waleedlatif1 commented May 6, 2026

Summary

  • Restore proper robots.txt structure that PR improvement(seo): optimize sitemaps, robots.txt, and core web vitals across sim and docs #4170 collapsed into a single wildcard rule (suspected culprit for the Profound metric drop on Apr 18).
  • Wildcard rule (User-agent: *) allows crawling with a tight disallow list for authenticated surfaces and internal endpoints. Any new AI/search crawler is auto-allowed without code changes.
  • Separate group for link-preview bots (Twitter/LinkedIn/Slack/Discord/etc.) with a looser disallow list so OG cards render for shared /chat/ and /form/ URLs.
  • No named AI/search bot allow-list — it's functionally redundant against the wildcard and goes stale fast. Behavior is identical for any compliant crawler.

Type of Change

  • Bug fix

Testing

Tested manually — typecheck passes, bun run lint clean. Output verified to render the expected two rule groups per Next.js Metadata API.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 6, 2026 9:16pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 6, 2026

PR Summary

Medium Risk
Changes robots.txt directives for all crawlers and specific link-preview user agents, which can materially impact indexing/crawl behavior if the disallow lists are wrong. Scope is small and isolated to SEO metadata generation.

Overview
Updates apps/sim/app/robots.ts to emit two robots rule groups: a wildcard (*) rule that allows crawling but disallows a defined set of internal/authenticated paths, and a separate rule set for link-preview bots (Twitter/Slack/Discord/etc.) with a looser disallow list so OG previews can fetch /chat/ and /form/ URLs.

Refactors the disallow lists into named constants (DISALLOWED_PATHS, LINK_PREVIEW_DISALLOWED_PATHS) and introduces a LINK_PREVIEW_BOTS allowlist to target the preview-specific behavior.

Reviewed by Cursor Bugbot for commit b9a1f58. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR restores a structured robots.ts by splitting what was previously a single wildcard rule into two groups: a tight wildcard disallow list for all bots, and a named group for link-preview bots with a looser list that intentionally permits /chat/ and /form/ so OG cards render correctly on social platforms.

  • Wildcard rule blocks authenticated and internal surfaces (/api/, /workspace/, /chat/, /form/, /playground/, etc.) for all crawlers by default, with /blog*tag= added for tag-filtered pages.
  • Link-preview bot group (Twitter, LinkedIn, Slack, Discord, Telegram, WhatsApp, Facebook, Pinterest, Reddit) drops /chat/ and /form/ from the disallow list, enabling OG metadata fetching, while still blocking authenticated paths and /playground/ — fixes from a previous revision of this PR that were already merged.

Confidence Score: 5/5

Safe to merge — the change is scoped entirely to robots.txt generation and carries no runtime risk to application behavior.

The restructuring is straightforward: one wildcard rule with a tight disallow list, one named link-preview bot group with a looser list for OG card fetching. All previously identified issues (incorrect Grok UA strings, no-op Bravebot entry, missing /playground/ and /w/ from the preview-bot disallow list) were corrected in a prior commit and are confirmed absent in the current file. Named user-agent groups take precedence over the wildcard per RFC 9309, so the routing logic is correct. No application logic, data access, or authentication paths are affected.

No files require special attention.

Important Files Changed

Filename Overview
apps/sim/app/robots.ts Restructures robots.txt into two rule groups: a wildcard rule with a tight disallow list, and a named group for link-preview bots with a looser list that allows /chat/ and /form/ for OG card fetching. Previous issues (Grok UA strings, Bravebot no-op, missing /playground/ and /w/ in the preview-bot list) are all addressed in this revision.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Bot Request] --> B{User-Agent Match?}
    B -->|Matches LINK_PREVIEW_BOTS\nTwitterbot, LinkedInBot,\nSlackbot, Discordbot, etc.| C[Apply LINK_PREVIEW_DISALLOWED_PATHS]
    B -->|Wildcard *\nall other bots| D[Apply DISALLOWED_PATHS]
    C --> E{Path Allowed?}
    D --> F{Path Allowed?}
    C -.->|Accessible for OG cards| G["/chat/, /form/"]
    E -->|Disallowed| H["Block: /api/, /workspace/, /w/, /playground/, /resume/, /invite/, /unsubscribe/, /credential-account/, /_next/, /private/"]
    E -->|Allowed| I[Crawl Permitted]
    F -->|Disallowed| J["Block: same as preview list + /chat/, /form/, /blog*tag="]
    F -->|Allowed| K[Crawl Permitted]
Loading

Reviews (3): Last reviewed commit: "chore(seo): trim verbose comments in rob..." | Re-trigger Greptile

Comment thread apps/sim/app/robots.ts Outdated
Comment thread apps/sim/app/robots.ts Outdated
Comment thread apps/sim/app/robots.ts
Comment thread apps/sim/app/robots.ts
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/app/robots.ts Outdated
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit b9a1f58. Configure here.

@waleedlatif1 waleedlatif1 merged commit 690b7ab into staging May 6, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/sofia-v3 branch May 6, 2026 21:21
waleedlatif1 added a commit that referenced this pull request May 7, 2026
…ink-preview rules (#4480)

* improvement(seo): restore explicit AI/search bot allow-list and add link-preview rules

* fix(seo): correct xAI UA strings, drop Bravebot, block /playground/ and /w/ from link-preview bots

* fix(seo): drop unverified Grok UAs, correct DeepSeekBot and ImagesiftBot tokens

* fix(seo): re-add Bravebot to allow-list per Brave Search docs

* improvement(seo): drop redundant named AI/search bot allow-list

* chore(seo): trim verbose comments in robots.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant