Skip to content

fix(shopify): parse malformed llms.txt#3

Draft
Jadenzzz wants to merge 3 commits into
mainfrom
jayden/shopify-bug
Draft

fix(shopify): parse malformed llms.txt#3
Jadenzzz wants to merge 3 commits into
mainfrom
jayden/shopify-bug

Conversation

@Jadenzzz
Copy link
Copy Markdown

@Jadenzzz Jadenzzz commented May 14, 2026

🚥 Resolves ISSUE_ID

🧰 Changes

The llms.txt extractor was silently dropping the majority of links on many real docs sites. Importing https://shopify.dev/docs produced 2 off-origin "Deploy to Fly.io / Render" pages as if they were the entire Shopify site, instead of the ~hundreds of actual Shopify references the file contains.

Five layered fixes to parseLlmsTxt / fetchLlmsTxt, plus one thread-through in the prompt builder.

Fixes

  • Accept * and + bullets, not just - since CommonMark allows all three; the llms.txt spec doesn't pin one. Shopify uses *; under the old regex 16/18 of its bullets were silently dropped.
  • Recognize ###### as section starts — Shopify's file has only 2 ## headings, with 18 ### and 54 #### carrying the real structure. The old h2-only check collapsed everything into a single mega-section.
  • Harvest inline [text](url) links from paragraph prose — Some files (Shopify, again) keep most refs in narrative text rather than bullet lists. Added a second sweep that runs on non-bullet, non-code-fence lines. Items dedupe by URL across the parse.
  • Code-fence-aware-> won't extract URLs from inside ``` blocks.
  • Same-origin filter (opt-in via sourceUrl) — When the source URL is known, all items must share its origin. Drops cross-references to GitHub, partner deploy guides, blog posts, etc. Strict origin (no subdomain coalescing).
  • Strip matched markdown emphasis from display text - **Extension targets**, *italic*, __bold__, and combined forms like **_x_** are unwrapped. Asymmetric input (**foo) is left untouched.

🧬 QA & Testing

Provide as much information as you can on how to test what you've done.

Loom

https://www.loom.com/share/4bacf2d0195040f4a10bd2efaaf1dff4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants