Coda Express Web Clipper Backend

Express backend for a web-clipping workflow that saves arbitrary product/page data into user-selected Coda tables.

The core trick is that the Coda table schema is not known ahead of time. The browser extension sends a URL plus a Coda doc/table target, and the backend fetches that table's columns at runtime, builds a matching Zod schema dynamically, asks Fireworks AI to extract structured data from Firecrawl output, then writes the row to Coda.

Architecture

Browser extension
  POST /api/save-bookmark
    { url, docId, tableId }
    Authorization: Bearer <user Coda token>
    x-api-key: <backend API key>

Express publisher
  validates payload
  triggers Upstash Workflow
  returns immediately

Upstash Workflow worker
  scrape -> Firecrawl
  fetch-coda-schema -> Coda columns and relation rows
  extract-data -> Fireworks via Vercel AI SDK
  save-to-coda -> Coda rows API

Stack

Node.js + Express
TypeScript
Upstash Workflow / QStash for durable execution
Firecrawl for scraping
Vercel AI SDK with the Fireworks provider
Fireworks model: accounts/fireworks/models/kimi-k2p6
Coda API for schema lookup and row insertion
Zod for dynamic runtime extraction schemas

Endpoints

`POST /api/save-bookmark`

Publisher endpoint used by the browser extension.

Headers:

Authorization: Bearer <user-coda-api-token>
x-api-key: <backend-api-key>
Content-Type: application/json

Body:

{
  "url": "https://www.amazon.com/dp/B0G6YDKYM8",
  "docId": "abc123",
  "tableId": "grid-xyz"
}

Compatibility fallback: codaToken may still be provided in the JSON body, but the preferred path is the Authorization header.

Response:

{
  "ok": true,
  "workflowRunId": "wfr_..."
}

`POST /api/workflow/save-bookmark`

Upstash Workflow worker endpoint. Do not call this directly from the client.

Environment Variables

See .env.example.

PORT=3000
API_KEY=
FIREWORKS_API_KEY=
FIRECRAWL_API_KEY=
QSTASH_URL=
QSTASH_TOKEN=
QSTASH_CURRENT_SIGNING_KEY=
QSTASH_NEXT_SIGNING_KEY=

`API_KEY`

Shared secret required by /api/save-bookmark in the x-api-key header.

`FIREWORKS_API_KEY`

Fireworks API key used by the AI SDK Fireworks provider.

`FIRECRAWL_API_KEY`

Firecrawl API key used for page scraping.

`QSTASH_*`

Upstash Workflow/QStash config. The signing keys are used by @upstash/workflow/express to verify workflow requests.

Local Development

Install dependencies:

pnpm install

Create a local env file:

cp .env.example .env

Run the server:

pnpm run dev

Type-check:

pnpm run build

The local server listens on PORT, defaulting to 3000.

Vercel Deployment

This repo includes vercel.json so Vercel uses server.ts as the single serverless entrypoint instead of the legacy Express generator app.js.

Required Vercel env vars:

API_KEY
FIREWORKS_API_KEY
FIRECRAWL_API_KEY
QSTASH_URL
QSTASH_TOKEN
QSTASH_CURRENT_SIGNING_KEY
QSTASH_NEXT_SIGNING_KEY

After changing backend code, redeploy before testing new workflow runs. Existing Upstash workflow runs can still reflect old deployed code.

Browser Extension

The matching browser extension lives here:

rheactdev/coda-express-extension

The extension should:

Let the user configure this backend's deployed base URL.
Let the user configure or retrieve a Coda API token securely.
Send the Coda token as Authorization: Bearer ....
Send the backend shared secret as x-api-key.
Send only url, docId, and tableId in the JSON body.

Dynamic Coda Schema Handling

The workflow calls:

GET /v1/docs/{docId}/tables/{tableId}
GET /v1/docs/{docId}/tables/{tableId}/columns

It builds a target column list with:

name
type
description
existingOptions
relation table metadata when available

For relation columns, it fetches existing rows from the related table so the model can map page context to existing tags before inventing new ones.

AI Extraction

The backend builds a Zod object schema dynamically from the selected Coda columns.

Examples:

Text-like columns -> string | null
Numeric columns -> number | null
Boolean columns -> boolean | null
Date/time columns -> ISO-ish string | null
Relation/select/person-like columns -> string | string[] | null

The AI prompt instructs the model:

Do not hallucinate.
Return null if a value is not present.
Use exact schema keys.
For relation columns, prefer existing relation values.

The model is hardcoded in code:

const FIREWORKS_MODEL = "accounts/fireworks/models/kimi-k2p6";

Token Budget Controls

The extraction prompt is intentionally compact:

JSON blocks are minified instead of pretty-printed.
Full Coda column objects are reduced to only name, type, description, multi-value support, and capped existing options.
Relation/select options are capped with MAX_EXISTING_OPTIONS_IN_PROMPT.
Markdown is truncated with MAX_MARKDOWN_CHARS.
When Firecrawl structured product data is available, markdown is truncated more aggressively with MAX_MARKDOWN_CHARS_WITH_STRUCTURED_DATA.
Metadata and structured data are separately bounded.

The current constants live near the top of server.ts:

MAX_MARKDOWN_CHARS
MAX_MARKDOWN_CHARS_WITH_STRUCTURED_DATA
MAX_METADATA_CHARS
MAX_STRUCTURED_DATA_CHARS
MAX_COLUMN_DESCRIPTION_CHARS
MAX_EXISTING_OPTIONS_IN_PROMPT
MAX_EXISTING_OPTION_CHARS

Firecrawl Scraping

Shopify product pages use a fast path before Firecrawl. If the URL looks like:

https://store.com/products/product-handle

the backend first tries:

https://store.com/products/product-handle.js

When that public Shopify product JSON endpoint works, the workflow uses it directly as structured product data and avoids Firecrawl for that page.

Generic non-Shopify sites use Markdown.

Amazon and Etsy use Firecrawl's JSON format plus Markdown:

formats: [
  "markdown",
  {
    type: "json",
    prompt: "Extract structured product listing data from this page.",
    schema: structuredScrapeSchema,
  },
]

Amazon fields include:

title
price
currency
availability
rating
review count
brand
ASIN
model number
item model number
product image
image URLs
features

Etsy fields include:

title
price
currency
shop name
shop URL
rating
review count
availability
listing ID
product image
image URLs
description
variations
materials

Save-To-Coda Normalization

Coda row cell values must be:

boolean | number | string | Array<boolean | number | string>

The backend normalizes extracted values before saving:

undefined and null cells are omitted.
Objects are reduced to common scalar fields like name, display, value, label, url, href, or text.
Remaining objects are JSON-stringified.
Relation/tag arrays are allowed.

Amazon URL columns are shortened before saving:

https://www.amazon.com/.../dp/B0G6YDKYM8/ref=...

becomes:

https://www.amazon.com/dp/B0G6YDKYM8

Logging

Workflow errors are logged with detailed serialized error objects for debugging.

Only current process.env values are redacted from logs. This means page content, AI output, Coda API response details, and outgoing row payloads may appear in logs if an error occurs.

Troubleshooting

Vercel picks `app.js` instead of `server.ts`

Make sure vercel.json is deployed. Without it, Vercel may detect multiple entrypoints and pick the old Express generator app.

Firecrawl says `json format must be an object`

Firecrawl SDK 4.25.0 expects object-form JSON formats:

{ type: "json", prompt, schema }

not plain "json".

AI SDK says `responseFormat` is unsupported

The Fireworks provider is OpenAI-compatible but the model needs structured outputs. The backend sets:

Object.defineProperty(fireworksExtractionModel, "supportsStructuredOutputs", {
  value: true,
  configurable: true,
});

Coda rejects `Invalid "value" field in a cell`

Coda does not accept null cell values or object cell values. The backend now omits nulls and coerces objects before posting rows.

SKU is missing on Amazon

The prompt includes SKU-like aliases:

SKU
Model Number
Item Model Number
Product Code
Part Number
Style Number
ASIN

For Amazon, it prefers Model Number / Item Model Number when present and falls back to ASIN.

Repository Notes

This project started from an Express generator template, so legacy files like app.js, bin/www, routes/, and views/ still exist. The active backend is server.ts.

Useful Commands

pnpm run dev
pnpm run build
pnpm start

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
bin		bin
public/stylesheets		public/stylesheets
routes		routes
views		views
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.js		app.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
server.ts		server.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coda Express Web Clipper Backend

Architecture

Stack

Endpoints

`POST /api/save-bookmark`

`POST /api/workflow/save-bookmark`

Environment Variables

`API_KEY`

`FIREWORKS_API_KEY`

`FIRECRAWL_API_KEY`

`QSTASH_*`

Local Development

Vercel Deployment

Browser Extension

Dynamic Coda Schema Handling

AI Extraction

Token Budget Controls

Firecrawl Scraping

Save-To-Coda Normalization

Logging

Troubleshooting

Vercel picks `app.js` instead of `server.ts`

Firecrawl says `json format must be an object`

AI SDK says `responseFormat` is unsupported

Coda rejects `Invalid "value" field in a cell`

SKU is missing on Amazon

Repository Notes

Useful Commands

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Coda Express Web Clipper Backend

Architecture

Stack

Endpoints

POST /api/save-bookmark

POST /api/workflow/save-bookmark

Environment Variables

API_KEY

FIREWORKS_API_KEY

FIRECRAWL_API_KEY

QSTASH_*

Local Development

Vercel Deployment

Browser Extension

Dynamic Coda Schema Handling

AI Extraction

Token Budget Controls

Firecrawl Scraping

Save-To-Coda Normalization

Logging

Troubleshooting

Vercel picks app.js instead of server.ts

Firecrawl says json format must be an object

AI SDK says responseFormat is unsupported

Coda rejects Invalid "value" field in a cell

SKU is missing on Amazon

Repository Notes

Useful Commands

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/save-bookmark`

`POST /api/workflow/save-bookmark`

`API_KEY`

`FIREWORKS_API_KEY`

`FIRECRAWL_API_KEY`

`QSTASH_*`

Vercel picks `app.js` instead of `server.ts`

Firecrawl says `json format must be an object`

AI SDK says `responseFormat` is unsupported

Coda rejects `Invalid "value" field in a cell`

Packages