Express backend for a web-clipping workflow that saves arbitrary product/page data into user-selected Coda tables.
The core trick is that the Coda table schema is not known ahead of time. The browser extension sends a URL plus a Coda doc/table target, and the backend fetches that table's columns at runtime, builds a matching Zod schema dynamically, asks Fireworks AI to extract structured data from Firecrawl output, then writes the row to Coda.
Browser extension
POST /api/save-bookmark
{ url, docId, tableId }
Authorization: Bearer <user Coda token>
x-api-key: <backend API key>
Express publisher
validates payload
triggers Upstash Workflow
returns immediately
Upstash Workflow worker
scrape -> Firecrawl
fetch-coda-schema -> Coda columns and relation rows
extract-data -> Fireworks via Vercel AI SDK
save-to-coda -> Coda rows API
- Node.js + Express
- TypeScript
- Upstash Workflow / QStash for durable execution
- Firecrawl for scraping
- Vercel AI SDK with the Fireworks provider
- Fireworks model:
accounts/fireworks/models/kimi-k2p6 - Coda API for schema lookup and row insertion
- Zod for dynamic runtime extraction schemas
Publisher endpoint used by the browser extension.
Headers:
Authorization: Bearer <user-coda-api-token>
x-api-key: <backend-api-key>
Content-Type: application/jsonBody:
{
"url": "https://www.amazon.com/dp/B0G6YDKYM8",
"docId": "abc123",
"tableId": "grid-xyz"
}Compatibility fallback: codaToken may still be provided in the JSON body, but the preferred path is the Authorization header.
Response:
{
"ok": true,
"workflowRunId": "wfr_..."
}Upstash Workflow worker endpoint. Do not call this directly from the client.
See .env.example.
PORT=3000
API_KEY=
FIREWORKS_API_KEY=
FIRECRAWL_API_KEY=
QSTASH_URL=
QSTASH_TOKEN=
QSTASH_CURRENT_SIGNING_KEY=
QSTASH_NEXT_SIGNING_KEY=Shared secret required by /api/save-bookmark in the x-api-key header.
Fireworks API key used by the AI SDK Fireworks provider.
Firecrawl API key used for page scraping.
Upstash Workflow/QStash config. The signing keys are used by @upstash/workflow/express to verify workflow requests.
Install dependencies:
pnpm installCreate a local env file:
cp .env.example .envRun the server:
pnpm run devType-check:
pnpm run buildThe local server listens on PORT, defaulting to 3000.
This repo includes vercel.json so Vercel uses server.ts as the single serverless entrypoint instead of the legacy Express generator app.js.
Required Vercel env vars:
API_KEYFIREWORKS_API_KEYFIRECRAWL_API_KEYQSTASH_URLQSTASH_TOKENQSTASH_CURRENT_SIGNING_KEYQSTASH_NEXT_SIGNING_KEY
After changing backend code, redeploy before testing new workflow runs. Existing Upstash workflow runs can still reflect old deployed code.
The matching browser extension lives here:
rheactdev/coda-express-extension
The extension should:
- Let the user configure this backend's deployed base URL.
- Let the user configure or retrieve a Coda API token securely.
- Send the Coda token as
Authorization: Bearer .... - Send the backend shared secret as
x-api-key. - Send only
url,docId, andtableIdin the JSON body.
The workflow calls:
GET /v1/docs/{docId}/tables/{tableId}
GET /v1/docs/{docId}/tables/{tableId}/columns
It builds a target column list with:
nametypedescriptionexistingOptions- relation table metadata when available
For relation columns, it fetches existing rows from the related table so the model can map page context to existing tags before inventing new ones.
The backend builds a Zod object schema dynamically from the selected Coda columns.
Examples:
- Text-like columns ->
string | null - Numeric columns ->
number | null - Boolean columns ->
boolean | null - Date/time columns -> ISO-ish
string | null - Relation/select/person-like columns ->
string | string[] | null
The AI prompt instructs the model:
- Do not hallucinate.
- Return
nullif a value is not present. - Use exact schema keys.
- For relation columns, prefer existing relation values.
The model is hardcoded in code:
const FIREWORKS_MODEL = "accounts/fireworks/models/kimi-k2p6";The extraction prompt is intentionally compact:
- JSON blocks are minified instead of pretty-printed.
- Full Coda column objects are reduced to only name, type, description, multi-value support, and capped existing options.
- Relation/select options are capped with
MAX_EXISTING_OPTIONS_IN_PROMPT. - Markdown is truncated with
MAX_MARKDOWN_CHARS. - When Firecrawl structured product data is available, markdown is truncated more aggressively with
MAX_MARKDOWN_CHARS_WITH_STRUCTURED_DATA. - Metadata and structured data are separately bounded.
The current constants live near the top of server.ts:
MAX_MARKDOWN_CHARS
MAX_MARKDOWN_CHARS_WITH_STRUCTURED_DATA
MAX_METADATA_CHARS
MAX_STRUCTURED_DATA_CHARS
MAX_COLUMN_DESCRIPTION_CHARS
MAX_EXISTING_OPTIONS_IN_PROMPT
MAX_EXISTING_OPTION_CHARSShopify product pages use a fast path before Firecrawl. If the URL looks like:
https://store.com/products/product-handle
the backend first tries:
https://store.com/products/product-handle.js
When that public Shopify product JSON endpoint works, the workflow uses it directly as structured product data and avoids Firecrawl for that page.
Generic non-Shopify sites use Markdown.
Amazon and Etsy use Firecrawl's JSON format plus Markdown:
formats: [
"markdown",
{
type: "json",
prompt: "Extract structured product listing data from this page.",
schema: structuredScrapeSchema,
},
]Amazon fields include:
- title
- price
- currency
- availability
- rating
- review count
- brand
- ASIN
- model number
- item model number
- product image
- image URLs
- features
Etsy fields include:
- title
- price
- currency
- shop name
- shop URL
- rating
- review count
- availability
- listing ID
- product image
- image URLs
- description
- variations
- materials
Coda row cell values must be:
boolean | number | string | Array<boolean | number | string>The backend normalizes extracted values before saving:
undefinedandnullcells are omitted.- Objects are reduced to common scalar fields like
name,display,value,label,url,href, ortext. - Remaining objects are JSON-stringified.
- Relation/tag arrays are allowed.
Amazon URL columns are shortened before saving:
https://www.amazon.com/.../dp/B0G6YDKYM8/ref=...
becomes:
https://www.amazon.com/dp/B0G6YDKYM8
Workflow errors are logged with detailed serialized error objects for debugging.
Only current process.env values are redacted from logs. This means page content, AI output, Coda API response details, and outgoing row payloads may appear in logs if an error occurs.
Make sure vercel.json is deployed. Without it, Vercel may detect multiple entrypoints and pick the old Express generator app.
Firecrawl SDK 4.25.0 expects object-form JSON formats:
{ type: "json", prompt, schema }not plain "json".
The Fireworks provider is OpenAI-compatible but the model needs structured outputs. The backend sets:
Object.defineProperty(fireworksExtractionModel, "supportsStructuredOutputs", {
value: true,
configurable: true,
});Coda does not accept null cell values or object cell values. The backend now omits nulls and coerces objects before posting rows.
The prompt includes SKU-like aliases:
- SKU
- Model Number
- Item Model Number
- Product Code
- Part Number
- Style Number
- ASIN
For Amazon, it prefers Model Number / Item Model Number when present and falls back to ASIN.
This project started from an Express generator template, so legacy files like app.js, bin/www, routes/, and views/ still exist. The active backend is server.ts.
pnpm run dev
pnpm run build
pnpm start