B2BWeb

AI-native B2B wholesale platform for mobile accessories. Built on Cloudflare Workers, D1, R2, Vectorize, and Hono.

Architecture

Obsidian Vault (content/)
  -> git push
  -> GitHub Actions CI/CD
    -> validate (type check, entity ID linting)
    -> generate (graph, link-index, llms.txt, JSON-LD)
    -> process images (WebP conversion, R2 upload)
    -> chunked D1 sync (idempotent, failure-strict)
    -> deploy Worker
  -> Cloudflare Edge
    -> SSR HTML (Google + humans)
    -> Markdown/plain text (AI agents)
    -> D1 (products, graph, runtime state)
    -> R2 (images)
    -> Vectorize (semantic search, optional)

Repository & Deployment Architecture

Design Principles

This project follows a shared-repository operating model with three hard rules:

Single local source of truth — Obsidian, Codex, Claude Code, and Git all work inside the same repository root.
Deployment isolation — the Cloudflare Worker runtime is only redeployed when code paths change. Markdown/content updates never redeploy the Worker by default.
Minimal architecture — no monorepo tooling, no turborepo, no pnpm workspaces. Static-first, edge-native, operationally simple.

Shared Repository

All local work happens in:

/Users/alexkou/Documents/github/b2bweb

This repository is both:

the code workspace for Claude Code
the content vault for Obsidian and Codex

Responsibilities are separated by path ownership and workflow rules, not by maintaining two live local clones.

See:

Single Shared Repo Operating Rules

CI/CD Workflow Split

Two separate GitHub Actions workflows enforce deployment isolation:

`deploy.yml` — triggered on code changes only

Fires when any of the following paths change:

src/**
scripts/**
wrangler.toml
package.json
package-lock.json
tsconfig.json
.github/workflows/deploy.yml

Pipeline stages:

validate (type check + entity ID lint)
  -> build (graph, llms.txt, JSON-LD)
    -> sync (D1 chunked batch)
      -> embeddings (Vectorize upsert)
    -> images (WebP conversion, R2 upload)
      -> deploy (wrangler deploy)

The Worker runtime is only redeployed at the end of this pipeline — never from content changes.

`content-sync.yml` — triggered on content changes only

Fires when any of the following paths change:

content/**
public/**
attachments/**

Pipeline stages are identical to deploy.yml up through embeddings and images, but the deploy job is absent:

validate (entity ID lint)
  -> build (graph, llms.txt, JSON-LD)
    -> sync (D1 chunked batch)
      -> embeddings (Vectorize upsert)
    -> images (WebP conversion, R2 upload)

Content editors (Obsidian, Codex) push markdown changes and get full D1 sync, vector indexing, and image processing — without ever touching the Worker runtime.

Why This Matters

Without path filtering, every Obsidian markdown push would trigger a full Worker redeploy. This wastes CI minutes, risks deploying an unintended Worker state, and conflates content authoring with infrastructure operations.

With path filtering:

A product markdown edit → D1 sync + embeddings only. No deploy.
A source code change → full pipeline including deploy.
Both change in the same commit → both workflows fire in parallel, each handling its domain.

Known Limitation (Current State)

The repository may contain both code and content, but daily operations must keep them separated by commit intent and path scope.

The long-term resolution is to split into two GitHub repositories:

Repo	Contains
`b2bweb`	Source code, Workers, D1 schema, deployment logic
`b2bweb-content`	Products, markdown, Obsidian vault

In the future architecture, the content repo pushes to GitHub Actions that pull content into the code repo, sync D1, and update Vectorize — without the content repo ever having access to deployment credentials or Worker configuration.

Source of Truth

Data	Owner	Where
Product content, SEO text, metadata	Markdown	`content/products/*.md`
Live price, stock, price_locked	D1 runtime	Admin API (`PUT /api/products/:id/live`)
Knowledge graph	Generated	`generated/graph.json` -> D1 snapshot
Embeddings	Vectorize	`b2bweb-products` index (bge-m3, 1024d)
Product images	R2	`products/{slug}/{hash}.webp`

Markdown controls product content and SEO. D1 controls live runtime state (stock quantities, locked prices). When both sources overlap on pricing fields, D1 wins if price_locked = 1.

Bindings

Name	Type	Purpose
`DB`	D1	Products, customers, orders, quotes, graph
`ASSETS`	R2	Product images (WebP)
`VECTORIZE`	Vectorize	Semantic search (bge-m3, 1024 dims)
`AI`	Workers AI	Embeddings, vision, inference
`ALLOWED_ORIGINS`	Var	CORS allowlist (comma-separated)
`ENVIRONMENT`	Var	`production` or `development`

Quick Start

git clone https://github.com/alexmorerich/b2bweb.git
cd b2bweb
npm install

# Initialize database
npm run db:init
npm run db:seed

# Run full content pipeline (validate -> graph -> llms -> jsonld -> sync)
npm run pipeline

# Start dev server
npm run dev
# -> http://localhost:8787

Content Pipeline

Pipeline order is critical. Graph and link-index must be generated before sync:

validate:ids -> content:graph -> content:llms -> content:jsonld -> content:sync

Pipeline Commands

# Validate content (linter-only, no mutations)
npm run validate:ids

# Generate artifacts
npm run content:graph       # Knowledge graph + link-index.json
npm run content:llms        # llms.txt hierarchy (3 layers)
npm run content:jsonld      # JSON-LD structured data
npm run content:embeddings  # Vector embeddings (bge-m3, 1024d)
npm run content:vision      # AI image descriptions
npm run content:images      # Process images (WebP, R2 upload)

# Sync to D1 (chunked, failure-strict)
npm run content:sync        # Local D1
npm run content:sync:remote # Remote D1

# Full pipeline
npm run pipeline            # validate -> graph -> llms -> jsonld -> sync
npm run pipeline:remote     # Same, remote D1

# Backup
npm run backup:d1           # Export all D1 tables
npm run backup:r2           # Download all R2 objects
npm run backup              # Both

Product Upload CLI

Scrape a product URL, extract structured product data, and generate a Markdown draft with atomic image download. The recommended mode is deterministic --no-llm; LLM mode is optional. This is a draft upload tool — generated products are active: false and won't appear on the live site until you review and publish them.

Setup

cd /Users/alexkou/Documents/github/b2bweb
npm run product:copy:setup
source .venv/bin/activate

API Key

The recommended path is --no-llm, which does not require OpenAI quota or a local model. If LLM mode is used, provide a real OpenAI Platform API key with --api_key or $OPENAI_API_KEY. Codex OAuth is not an OpenAI Platform API key and is not used by this script.

Usage

python3 scripts/copy_product.py \
  --url "https://m.gadgetfix.com/white-charging-port-dock-usb-c-connector-flex-cable-for-iphone-16e-iphone-17e-11052.html" \
  --require-image \
  --strict \
  --no-llm

Use the non-mobile https://gadgetfix.com/... URL if the mobile URL has SSL issues.

Usage with Ollama (Local LLM)

export OPENAI_BASE_URL="http://localhost:11434/v1"
export OPENAI_API_KEY="ollama"

python3 scripts/copy_product.py \
  --url "https://m.gadgetfix.com/..." \
  --model "qwen2.5" \
  --require-image --strict

Review in Obsidian

Generated products are drafts (draft: true, needs_review: true, active: false). Review title, SKU, MOQ, pricing, compatible models, materials, image, and source accuracy. Publish by changing:

draft: false
needs_review: false
active: true

Push Automatically

python3 scripts/copy_product.py \
  --url "https://m.gadgetfix.com/..." \
  --require-image --strict --no-llm --push \
  --commit-message "content: add iphone 16e 17e charging port flex"

--push commits and pushes to the current branch (sets upstream automatically). It does not merge to main or create a PR. The Cloudflare deploy pipeline only triggers when changes reach main.

CLI Flags

Flag	Description
`--url`	(Required) Product page URL to scrape
`--model`	LLM model name (default: `gpt-4o-mini`)
`--api_key`	OpenAI Platform API key for LLM mode
`--base_url`	API base URL (default: `$OPENAI_BASE_URL` or OpenAI)
`--no-llm`	Use deterministic page parsing and skip OpenAI/Ollama
`--force`	Overwrite existing Markdown file
`--force-image`	Overwrite existing image file
`--require-image`	Fail if image cannot be saved
`--strict`	Fail if category/compatible_models/materials are empty
`--push`	Validate, commit, and push to current branch
`--commit-message`	Custom commit message (default: `"content: add copied product"`)

See docs/PRODUCT_UPLOAD_CLI.md for full details and safety notes.

Batch Product Upload CLI v10 — Beginner Tutorial

This section is a complete walkthrough for importing products in bulk. No prior experience with the batch CLI is required. If you can run commands in a terminal, you can use this.

What This Does

The Batch Product Upload CLI lets you point at a supplier's category page, pick which products you want by keyword, and automatically generate draft Markdown files for each matching product — complete with images, metadata, and frontmatter. You review the drafts in Obsidian, approve the ones you want, and promote them into the live product catalog.

Nothing is published automatically. Every product goes through human review before it reaches the live site.

Design Concept

Why It Works This Way

The system is designed around three principles:

Local scripts do the work, not AI. The CLI fetches pages, parses HTML, and downloads images using plain Python and curl. No LLM is called during batch intake (--no-llm is always on). This keeps the process fast, deterministic, and free from API costs.
Sandbox first, publish later. Generated drafts land in a staging area (content/_incoming/), never directly in production (content/products/). You review and approve each file before it goes live. This prevents bad data from ever reaching customers.
Deduplication is automatic. Every imported URL is hashed and recorded in a local registry. If you run the same intake twice, already-imported products are silently skipped. You never get duplicates.

Pipeline Architecture

                          YOU ARE HERE
                               |
                               v
Step 1   Supplier category page (e.g. gadgetfix.com/apple-parts.html)
           |
           v
Step 2   extract_candidates.py ---- fetches the page, collects all product URLs
           |
           v
Step 3   filter_candidates.py ----- keeps only URLs matching your keywords,
           |                         removes URLs matching your exclude terms
           v
Step 4   batch_copy_products.sh --- for each new URL:
           |                          - runs copy_product.py (fetch + parse + image download)
           |                          - deduplicates against the import registry
           |                          - validates generated drafts
           |                          - commits the registry atomically
           v
Step 5   content/_incoming/products/{run_id}/ --- your draft sandbox
           |
           v
Step 6   YOU review in Obsidian --- check titles, images, pricing, models
           |                        set review_status: approved on good ones
           v
Step 7   promote_incoming.py ------ copies approved drafts to content/products/
           |                         updates frontmatter (draft:false, active:true)
           v
Step 8   validate, commit, PR, merge --- standard content workflow
           |
           v
Step 9   Cloudflare deploys automatically

File Layout After a Run

b2bweb/
  content/
    _incoming/
      products/
        20260517-143022/          <-- your run
          some-product.md         <-- draft (review_status: pending)
          another-product.md
          assets/
            some-product-main.jpg
            another-product-main.jpg
          batch.log               <-- what happened during import
          product-urls.txt        <-- which URLs were processed
    products/                     <-- production (after promotion)
      some-product.md             <-- promoted (active: true)
  .local/
    intake/
      import-registry.txt         <-- URL hash registry (dedup)

Prerequisites

You need:

macOS with Python 3.10+ and curl (both ship with macOS)
Node.js 18+ and npm
Git
A clone of this repository

One-Time Setup

# 1. Clone the repo (skip if you already have it)
git clone https://github.com/alexmorerich/b2bweb.git
cd b2bweb

# 2. Install Node dependencies
npm install

# 3. Create a Python virtual environment
python3 -m venv .venv

# 4. Activate the virtual environment
source .venv/bin/activate

# 5. Install Python dependencies for the product scraper
pip install -r scripts/requirements-copy-product.txt

You only do this once. In future sessions, just run source .venv/bin/activate.

Step-by-Step Guide

Step 1: Choose Your Target

Pick a supplier category page. For example, to import iPhone repair parts from GadgetFix:

https://gadgetfix.com/apple-parts.html

Open the page in your browser to see what's there. Note the kinds of products you want (and don't want).

Step 2: Set Environment Variables

Open a terminal in the b2bweb directory and set these three required variables:

cd /path/to/b2bweb
source .venv/bin/activate

# Which domains are allowed (comma-separated if multiple)
export B2BWEB_ALLOWED_DOMAINS="gadgetfix.com"

# The category/search page to scrape
export B2BWEB_SOURCE_URL="https://gadgetfix.com/apple-parts.html"

# Keywords to match (one per line, using $'...' syntax)
export B2BWEB_KEYWORDS=$'iphone 17 pro max
bluetooth
flex cable
charging port
usb-c'

Optionally, exclude products you don't want:

export B2BWEB_EXCLUDE_TERMS=$'case
protector
tempered glass'

How keywords work: The filter checks each product URL for your keywords. A URL like replacement-bluetooth-flex-cable-for-iphone-17-pro-max-11014.html would match bluetooth, flex cable, and iphone 17 pro max. URLs containing any exclude term are dropped.

Step 3: Run the Intake

bash scripts/intake_run.sh

This single command runs the full pipeline (extract -> filter -> batch copy). You'll see output like:

=== B2BWeb Batch Intake v10 ===
run_id: 20260517-143022
source: https://gadgetfix.com/apple-parts.html

--- Extracting candidates...
--- Filtering candidates...
--- Running batch copy...

=== INTAKE SUMMARY ===
pipeline_version: v10
schema_version: 1
batch_status: success
registry_committed: true
blocked: false
candidate_count: 87
selected_count: 12
duplicate_skipped_count: 0
new_url_count: 12
success_generated: 12
failed_generated: 0
stage_directory: content/_incoming/products/20260517-143022

Reading the summary:

Field	Meaning
`candidate_count`	Total product URLs found on the page
`selected_count`	URLs that matched your keywords (after excludes)
`duplicate_skipped_count`	URLs already imported in a previous run
`new_url_count`	URLs actually processed this run
`success_generated`	Draft files successfully created
`failed_generated`	Products that failed to import
`batch_status`	`success` or `no_new_urls` = healthy. Anything else = check logs
`stage_directory`	Where your drafts live

Step 4: Review Drafts in Obsidian

Open your b2bweb vault in Obsidian. Navigate to:

content/_incoming/products/20260517-143022/

Each .md file is a product draft. Open one and check:

Title — Is it accurate and readable?
SKU — Does it look right?
Compatible models — Are the right devices listed?
Materials — Reasonable for the product type?
Image — Scroll down. Does the product image show correctly?
Price — If extracted, is it in the right range?
Source URL — At the bottom. Click it to verify against the original page.

The frontmatter will look like this:

---
id: 01JXYZ...
entity_type: product
slug: replacement-bluetooth-flex-cable-for-iphone-17-pro-max
sku: "IP17PM-BT-FLEX-11014"
title: "Replacement Bluetooth Flex Cable for iPhone 17 Pro Max"
draft: true
needs_review: true
active: false
source_url: "https://gadgetfix.com/replacement-bluetooth-flex-cable-..."
review_status: pending        # <-- change this to approve
category: ["accessories"]
compatible_models: ["iPhone 17 Pro Max"]
materials: ["flexible pcb"]
moq: 1
unit: "pcs"
price_usd: 0.0
---

Step 5: Approve Good Drafts

For each product you want to publish, change one line in the frontmatter:

# Before
review_status: pending

# After
review_status: approved

Leave products you don't want as pending, or delete the file entirely.

You can also fix any metadata while reviewing — edit the title, adjust the price, add compatible models, etc.

Step 6: Promote Approved Drafts

Once you've marked your approved products, run:

python3 scripts/promote_incoming.py content/_incoming/products/20260517-143022

Replace 20260517-143022 with your actual run ID.

This command:

Copies files with review_status: approved into content/products/
Copies their images into content/products/assets/
Updates frontmatter: draft: false, active: true, review_status: promoted
Skips everything still marked pending

Output:

{
  "promoted_count": 8,
  "skipped_count": 4,
  "error_count": 0
}

Step 7: Validate

Run the content validator to make sure everything is consistent:

npm run validate:ids

You should see:

=== Content Validation ===

All content is valid. No issues found.

If there are issues (missing IDs, duplicate slugs), the validator will tell you exactly what to fix.

Step 8: Commit and Push

Create a branch, commit the promoted products, and push:

# Make sure you're on a clean branch
git checkout main
git pull origin main
git checkout -b content/batch-products-20260517-143022

# Stage only the promoted production files
git add content/products/ content/products/assets/
git commit -m "content: add batch products from 20260517-143022"

# Push and create PR
git push -u origin HEAD

Open the PR link printed by git, review it on GitHub, and merge to main. The Cloudflare CI pipeline will automatically validate, sync to D1, and deploy.

Optional: Advanced Configuration

Custom URL Pattern

By default, the extractor looks for URLs ending in -{number}.html. If your supplier uses a different URL format:

# Match URLs ending in /product/{number}
export B2BWEB_PRODUCT_URL_REGEX='.*\/product\/[0-9]+$'

bash scripts/intake_run.sh

Detail-Mode Filtering

By default, keywords are matched against the URL text only. For more accurate filtering, enable detail mode — this fetches each candidate page's title and meta description:

export B2BWEB_FILTER_FETCH_DETAILS=1

bash scripts/intake_run.sh

This is slower (one HTTP request per candidate) but catches products whose URLs don't contain descriptive text.

Overwriting Existing Products

If you need to re-import a product that's already in content/products/:

python3 scripts/promote_incoming.py content/_incoming/products/{run_id} --overwrite --overwrite-assets

Verbose Output

To see which URLs are being processed:

export B2BWEB_VERBOSE=1
bash scripts/intake_run.sh

Troubleshooting

"lock_held" Error

Another intake process is running, or a previous one crashed. Remove the stale lock:

rmdir .local/intake/.lock

Zero Candidates Found

Check that B2BWEB_SOURCE_URL actually contains product links
Check that B2BWEB_ALLOWED_DOMAINS matches the domain in the URLs
Try adjusting B2BWEB_PRODUCT_URL_REGEX if the site uses non-standard URLs

All Products Skipped as Duplicates

The registry remembers every URL you've imported. If you need to re-import:

# View the registry
cat .local/intake/import-registry.txt

# Clear it to allow re-importing everything
> .local/intake/import-registry.txt

Validation Failures

Check the batch log for details:

cat content/_incoming/products/{run_id}/batch.log | tail -30

Common issues:

Product page returned 403/429 (blocked by the site)
No product images found on the page
Title couldn't be extracted (empty page or JavaScript-rendered content)

Products Missing After Promotion

Make sure you changed review_status: approved (not approve or Approved — it's case-sensitive and must be exactly approved).

Quick Reference

# Full intake (one command)
export B2BWEB_ALLOWED_DOMAINS="gadgetfix.com"
export B2BWEB_SOURCE_URL="https://gadgetfix.com/apple-parts.html"
export B2BWEB_KEYWORDS=$'bluetooth\nflex cable\ncharging port'
bash scripts/intake_run.sh

# Review in Obsidian, then promote
python3 scripts/promote_incoming.py content/_incoming/products/{run_id}

# Validate and ship
npm run validate:ids
git add content/products/ && git commit -m "content: add batch products" && git push

For the full technical reference, see docs/BATCH_PRODUCT_UPLOAD_CLI.md.

Single Product Upload — Terminal-Only Tutorial (CLI Manual)

Upload one product at a time using only macOS Terminal. This uses copy_product.py directly — no batch pipeline, no keyword filtering. Best for adding a specific product you already found on a supplier site.

How It Works

You find a product URL on a supplier site
    |
    v
copy_product.py
    |  curl (fetches the page)
    |  parses title, SKU, images, price, models, materials
    |  curl (downloads product image)
    |  writes Markdown draft to content/products/
    v
content/products/{slug}.md     <-- draft (active: false)
    |
    |  you review and publish
    v
git commit + push  -->  CI  -->  Cloudflare (live)

Two extraction modes:

Mode	Flag	How it works	Needs API key?
LLM	(default)	Sends page text to GPT/Ollama, gets structured JSON back	Yes
Deterministic	`--no-llm`	Parses HTML directly with Python — no AI, no cost	No

One-Time Setup

cd ~/Documents/github/b2bweb

# Install Python dependencies
npm run product:copy:setup

# Activate the virtual environment
source .venv/bin/activate

For future sessions, just run source .venv/bin/activate.

Step-by-Step Commands

Step 1 — Find a product URL

Browse the supplier site and copy the product page URL. Example:

https://gadgetfix.com/white-charging-port-dock-usb-c-connector-flex-cable-for-iphone-16e-iphone-17e-11052.html

Step 2 — Run the scraper

Option A: Without AI (deterministic, free, no API key)

cd ~/Documents/github/b2bweb
source .venv/bin/activate

python3 scripts/copy_product.py \
  --url "https://gadgetfix.com/white-charging-port-dock-usb-c-connector-flex-cable-for-iphone-16e-iphone-17e-11052.html" \
  --require-image \
  --strict \
  --no-llm

Option B: With OpenAI

export OPENAI_API_KEY="sk-..."

python3 scripts/copy_product.py \
  --url "https://gadgetfix.com/white-charging-port-dock-usb-c-connector-flex-cable-for-iphone-16e-iphone-17e-11052.html" \
  --require-image \
  --strict

Option C: With Ollama (local LLM, free)

export OPENAI_BASE_URL="http://localhost:11434/v1"
export OPENAI_API_KEY="ollama"

python3 scripts/copy_product.py \
  --url "https://gadgetfix.com/..." \
  --model "qwen2.5" \
  --require-image \
  --strict

You'll see output like:

🚀 Scraping target: https://gadgetfix.com/white-charging-port-...
🧩 Extracting structured JSON with deterministic parser (--no-llm)...
📸 Image downloaded safely via curl: assets/white-charging-port-...-main.jpg (45832 bytes)

🎉 Clean Draft Successfully Created: content/products/white-charging-port-dock-usb-c-connector-flex-cable-for-iphone-16e-iphone-17e.md

Step 3 — Review the generated file

# See the frontmatter
head -30 content/products/white-charging-port-dock-usb-c-connector-flex-cable-for-iphone-16e-iphone-17e.md

Output:

---
id: 01JXYZ...
entity_type: product
slug: white-charging-port-dock-usb-c-connector-flex-cable-for-iphone-16e-iphone-17e
sku: "IP16E-IP17E-USBC-CHG-FLEX-WHT-11052"
title: "White Charging Port Dock USB-C Connector Flex Cable For iPhone 16e iPhone 17e"
draft: true
needs_review: true
active: false
source_url: "https://gadgetfix.com/white-charging-port-..."
category: ["accessories"]
compatible_models: ["iPhone 16e", "iPhone 17e"]
materials: ["flexible pcb", "usb-c connector"]
moq: 1
price_usd: 5.50
---

Check the image was downloaded:

ls -la content/products/assets/ | grep white-charging

Step 4 — Edit metadata if needed

# Fix the MOQ
sed -i '' 's/^moq: 1$/moq: 20/' \
  content/products/white-charging-port-dock-usb-c-connector-flex-cable-for-iphone-16e-iphone-17e.md

# Fix a price
sed -i '' 's/^price_usd: 5.50$/price_usd: 7.95/' \
  content/products/white-charging-port-dock-usb-c-connector-flex-cable-for-iphone-16e-iphone-17e.md

# Or open in nano for bigger edits
nano content/products/white-charging-port-dock-usb-c-connector-flex-cable-for-iphone-16e-iphone-17e.md

Step 5 — Publish the draft

Change three fields in the frontmatter to make it live:

FILE="content/products/white-charging-port-dock-usb-c-connector-flex-cable-for-iphone-16e-iphone-17e.md"

sed -i '' 's/^draft: true$/draft: false/' "$FILE"
sed -i '' 's/^needs_review: true$/needs_review: false/' "$FILE"
sed -i '' 's/^active: false$/active: true/' "$FILE"

Verify:

grep -E '^(draft|needs_review|active):' "$FILE"

Expected:

draft: false
needs_review: false
active: true

Step 6 — Validate

npm run validate:ids

Expected: All content is valid. No issues found.

Step 7 — Commit and push

git add content/products/
git commit -m "content: add iphone 16e 17e charging port flex cable"
git push origin main

Or use --push to do it in one shot (Step 2 + Step 7 combined):

python3 scripts/copy_product.py \
  --url "https://gadgetfix.com/..." \
  --require-image --strict --no-llm --push \
  --commit-message "content: add iphone 16e 17e charging port flex cable"

CLI Flag Reference

Flag	Description
`--url`	(Required) Product page URL
`--no-llm`	Deterministic parsing — no AI, no API key needed
`--model`	LLM model name (default: `gpt-4o-mini`)
`--api_key`	OpenAI API key (default: `$OPENAI_API_KEY` env)
`--base_url`	API base URL (for Ollama: `http://localhost:11434/v1`)
`--require-image`	Fail if no product image can be downloaded
`--strict`	Fail if category/compatible_models/materials are empty
`--force`	Overwrite existing Markdown file
`--force-image`	Overwrite existing image asset
`--output-dir`	Write to a custom directory instead of `content/products/`
`--push`	Auto-validate, commit, and push after generating
`--commit-message`	Custom git commit message (used with `--push`)

Batch Product Upload — Terminal-Only Tutorial

Upload many products at once from a supplier's category page using only macOS Terminal. The batch CLI scans a page for product links, filters by your keywords, and runs the single-product scraper on each match. No Obsidian, no GUI required.

How It Works

 Supplier website          Your Mac (Terminal)              GitHub / Cloudflare
 ================          ===================              ===================

 Category page
      |
      |  curl (fetch HTML)
      v
 extract_candidates.py --- collects all product URLs
      |
      v
 filter_candidates.py ---- keeps only keyword matches
      |
      v
 batch_copy_products.sh -- for each URL:
      |                      copy_product.py --no-llm
      |                      (fetch + parse + download image)
      v
 content/_incoming/        SANDBOX (not live)
      |
      |  you review with: ls, head, grep, sed
      |  you approve with: sed (change one line)
      v
 promote_incoming.py ----- copies approved drafts
      |                     to content/products/
      v
 git add, commit, push  ---------------------->  GitHub Actions CI
                                                      |
                                                      v
                                                 validate + sync
                                                      |
                                                      v
                                                 Cloudflare Edge
                                                 (live on the web)

Design Principles

Principle	What it means
No AI during batch import	Products are parsed with Python + curl, not LLMs. `--no-llm` is always on. Fast, free, deterministic.
Sandbox first	Drafts land in `content/_incoming/` staging area. Nothing reaches `content/products/` until you explicitly promote.
Auto-deduplication	Every URL is SHA-256 hashed into a local registry. Run the same import twice and duplicates are silently skipped.

Directory Layout

b2bweb/
  content/
    _incoming/products/{run_id}/     <-- sandbox (your drafts land here)
      bluetooth-flex-cable.md
      charging-port-flex.md
      assets/
        bluetooth-flex-cable-main.jpg
      batch.log
    products/                        <-- production (after you promote)
      bluetooth-flex-cable.md
      assets/
  .local/intake/
    import-registry.txt              <-- URL dedup registry

One-Time Setup

cd ~/Documents/github/b2bweb
npm install
python3 -m venv .venv
source .venv/bin/activate
pip install -r scripts/requirements-copy-product.txt

For future sessions, just: cd ~/Documents/github/b2bweb && source .venv/bin/activate

Step-by-Step Commands

Step 1 — Set your target and keywords

cd ~/Documents/github/b2bweb
source .venv/bin/activate

# REQUIRED: which domains to allow
export B2BWEB_ALLOWED_DOMAINS="gadgetfix.com"

# REQUIRED: the category page to scan
export B2BWEB_SOURCE_URL="https://gadgetfix.com/apple-parts.html"

# REQUIRED: keywords to include (one per line)
export B2BWEB_KEYWORDS=$'iphone 17 pro max
bluetooth
flex cable
charging port
usb-c'

# OPTIONAL: terms to exclude
export B2BWEB_EXCLUDE_TERMS=$'case
protector
tempered glass'

How keywords work: Each product URL is converted to readable text (replacement-bluetooth-flex-cable becomes replacement bluetooth flex cable). If any keyword appears, it's included. If any exclude term appears, it's dropped.

Step 2 — Run the intake

bash scripts/intake_run.sh

Output:

=== INTAKE SUMMARY ===
batch_status: success
registry_committed: true
candidate_count: 87
selected_count: 12
new_url_count: 12
success_generated: 12
failed_generated: 0
stage_directory: content/_incoming/products/20260518-143022

Save the run ID:

RUN_ID=$(ls -t content/_incoming/products/ | head -1)
echo "Run ID: $RUN_ID"

Step 3 — See what was generated

# List drafts
ls content/_incoming/products/$RUN_ID/*.md

# Quick-scan all titles
grep '^title:' content/_incoming/products/$RUN_ID/*.md

# Check images
ls content/_incoming/products/$RUN_ID/assets/

Step 4 — Review a specific draft

head -35 content/_incoming/products/$RUN_ID/replacement-bluetooth-flex-cable-for-iphone-17-pro-max.md

Check these fields:

Field	What to check
`title`	Readable, accurate product name?
`sku`	Sensible abbreviation?
`compatible_models`	Correct device(s)?
`materials`	Reasonable for this product?
`price_usd`	In the right range? (`0.0` = not detected, edit manually)

Step 5 — Approve drafts

Approve a single file:

sed -i '' 's/^review_status: pending$/review_status: approved/' \
  content/_incoming/products/$RUN_ID/replacement-bluetooth-flex-cable-for-iphone-17-pro-max.md

Approve all at once:

sed -i '' 's/^review_status: pending$/review_status: approved/' \
  content/_incoming/products/$RUN_ID/*.md

Approve all, then un-approve rejects:

# Approve everything first
sed -i '' 's/^review_status: pending$/review_status: approved/' \
  content/_incoming/products/$RUN_ID/*.md

# Un-approve specific files you don't want
sed -i '' 's/^review_status: approved$/review_status: pending/' \
  content/_incoming/products/$RUN_ID/front-camera-flex-cable-for-iphone-17-pro-max.md

Verify:

grep '^review_status:' content/_incoming/products/$RUN_ID/*.md

Step 5b — (Optional) Edit metadata before promoting

# Fix a price
sed -i '' 's/^price_usd: 0.0$/price_usd: 4.50/' \
  content/_incoming/products/$RUN_ID/replacement-bluetooth-flex-cable-for-iphone-17-pro-max.md

# Fix MOQ
sed -i '' 's/^moq: 1$/moq: 20/' \
  content/_incoming/products/$RUN_ID/replacement-bluetooth-flex-cable-for-iphone-17-pro-max.md

# Or use nano for bigger changes
nano content/_incoming/products/$RUN_ID/replacement-bluetooth-flex-cable-for-iphone-17-pro-max.md

Step 6 — Promote approved drafts

python3 scripts/promote_incoming.py content/_incoming/products/$RUN_ID

Output:

{
  "promoted_count": 10,
  "skipped_count": 2,
  "error_count": 0
}

Verify files landed in production:

ls content/products/*.md | tail -5

Step 7 — Validate

npm run validate:ids

Expected: All content is valid. No issues found.

Step 8 — Commit and push

git add content/products/
git commit -m "content: add batch products from $RUN_ID"
git push origin main

GitHub Actions CI automatically validates, syncs to D1, and deploys to Cloudflare.

Complete Copy-Paste Script

Replace the values at the top and paste into Terminal:

# --- Configuration (EDIT THESE) ---
cd ~/Documents/github/b2bweb
source .venv/bin/activate

export B2BWEB_ALLOWED_DOMAINS="gadgetfix.com"
export B2BWEB_SOURCE_URL="https://gadgetfix.com/apple-parts.html"
export B2BWEB_KEYWORDS=$'iphone 17 pro max
bluetooth
flex cable
charging port
usb-c'
export B2BWEB_EXCLUDE_TERMS=$'case
protector
tempered glass'

# --- Run intake ---
bash scripts/intake_run.sh

# --- Identify run ---
RUN_ID=$(ls -t content/_incoming/products/ | head -1)
echo "Run: $RUN_ID"

# --- Quick review ---
echo "=== TITLES ==="
grep '^title:' content/_incoming/products/$RUN_ID/*.md

# --- Approve all ---
sed -i '' 's/^review_status: pending$/review_status: approved/' \
  content/_incoming/products/$RUN_ID/*.md

# --- Promote ---
python3 scripts/promote_incoming.py content/_incoming/products/$RUN_ID

# --- Validate and push ---
npm run validate:ids
git add content/products/
git commit -m "content: add batch products from $RUN_ID"
git push origin main

Xxxweb Category Example

Xxxweb category pages include product-card links and extra recommendation links. Use B2BWEB_LINK_CLASS="product-main-img" to import only the visible category products and avoid bottom/related product blocks.

cd ~/Documents/github/b2bweb
source .venv/bin/activate

export B2BWEB_ALLOWED_DOMAINS="www.xxxweb-online.com,xxxweb-online.com"
export B2BWEB_SOURCE_URL="https://www.xxxweb-online.com/product/default!search.do?keyword=&categoryId=111855&priceRange=&brandIds=&colorIds=&certs=&brandModelIds=&propOptions=&closedFilters=&orderBy=rank&desc=true"
export B2BWEB_PRODUCT_URL_REGEX="^/p/[^/]+/.*\.htm$"
export B2BWEB_LINK_CLASS="product-main-img"
export B2BWEB_KEYWORDS=$'iphone'
export B2BWEB_MAX_FAILURES=100

bash scripts/intake_run.sh

Environment Variable Reference

Variable	Required	Default	Description
`B2BWEB_ALLOWED_DOMAINS`	Yes	—	Comma-separated allowed domains
`B2BWEB_SOURCE_URL`	Yes	—	Category/search page URL to scan
`B2BWEB_KEYWORDS`	Yes	—	Newline-separated include keywords
`B2BWEB_EXCLUDE_TERMS`	No	empty	Newline-separated exclude terms
`B2BWEB_RUN_ID`	No	`YYYYMMDD-HHMMSS`	Custom run identifier
`B2BWEB_RUN_DIR`	No	`/tmp/b2bweb-intake/{run_id}`	Custom runtime directory
`B2BWEB_PRODUCT_URL_REGEX`	No	`.*-[0-9]+\.html$`	Regex for product URL matching
`B2BWEB_LINK_CLASS`	No	unset	Only keep product links whose anchor has this CSS class, useful for excluding recommendation blocks
`B2BWEB_MAX_FAILURES`	No	`3`	Stop batch after N failures
`B2BWEB_FILTER_FETCH_DETAILS`	No	`0`	Set `1` to fetch page titles for filtering
`B2BWEB_VERBOSE`	No	`0`	Set `1` for verbose output
`B2BWEB_PROXY`	No	unset	HTTP proxy for curl

Troubleshooting

"lock_held" error — Previous run crashed. Fix: rmdir .local/intake/.lock

Zero candidates — URL regex doesn't match the site. Debug:

curl -sL "$B2BWEB_SOURCE_URL" | grep -oE 'href="[^"]*"' | head -20

All duplicates — Registry already has them. Reset: > .local/intake/import-registry.txt

Validation fails — Auto-fix: npm run validate:ids -- --fix

price_usd: 0.0 — Scraper couldn't extract price. Bulk-fix with sed:

sed -i '' 's/^price_usd: 0.0$/price_usd: 5.00/' content/products/PRODUCT-SLUG.md

Undo a promotion — Delete the file: rm content/products/PRODUCT-SLUG.md

View import log — cat content/_incoming/products/$RUN_ID/batch.log

Watermark Removal

Remove supplier watermarks from product images using LaMA AI inpainting. Scripts: scripts/run_lama.py (single image) and scripts/remove-watermarks.py (batch).

Setup

pip3 install -r scripts/requirements-watermark.txt
# LaMa TorchScript model (~206 MB) — high-frequency texture-preserving inpainting.
# Runs directly on torch (CPU/MPS); no IOPaint needed.
curl -L -o /tmp/big-lama-model.pt \
  "https://github.com/enesmsahin/simple-lama-inpainting/releases/download/v0.1.0/big-lama.pt"

LaMa vs OpenCV inpainting. Telea/Navier-Stokes leave blur/artifacts on dense industrial textures (PCB traces, laser etching, gold pins). The LaMa model reconstructs high-frequency texture, giving commercial-grade results (sharpness ratio ≈ 1.0). It activates automatically when the model file exists; pass --use-lama to pilot/process to enable it (off by default since CPU inference is ~6 s/image).

IOPaint (the usual LaMa wrapper) does not build on Python 3.13 — it pins an old Pillow that fails to compile. We use the underlying TorchScript model directly via _inpaint_lama, which forces CPU to avoid an Apple-MPS compiler crash seen with repeated GPU loads.

Step 0 — QA pilot (recommended first)

Before a full run, sanity-check detection + mask quality on a small batch. The pilot command scans a capped sample, attempts a clean on each, and emits an original / mask / cleaned preview HTML for visual sign-off. Source images are never modified; all output goes to a folder outside the repo.

python3 scripts/remove-watermarks.py pilot \
  --assets /Users/alexkou/Documents/github/b2bweb/content/products/assets \
  --max-total 50 \
  --preset review \
  --rights-confirmed

Flag	Meaning
`--assets`	Source image directory (read-only)
`--max-total`	Max watermarked images to include in the pilot (default 50)
`--preset`	Detection preset (default `review` — lenient discovery + MSER + verification)
`--rights-confirmed`	Required. Confirms you hold the rights to remove the watermark. Without it the command refuses.
`--out`	Output folder (default `~/Downloads/sunsky-watermark-pilot-<timestamp>`; must be outside the repo)
`--use-lama`	Use LaMA inpainting if `/tmp/big-lama-model.pt` exists (slower, higher quality)

Output: review.html (side-by-side original/mask/cleaned with per-image status), cleaned/, masks/, manifest.json. Open with open ~/Downloads/sunsky-watermark-pilot-*/review.html.

Step 1 — Scan for watermarks

⚠️ iPhone 14 and newer are EXCLUDED from scanning. Supplier (sunsky-online.com) stopped watermarking iPhone 14/15/16/17 series images — those product photos are already clean. The scanner skips any filename whose only iPhone token is iphone-14 or higher (e.g. for-iphone-14-pro-..., for-iphone-15-pro-max-..., for-iphone-16-..., for-iphone-17-...). Rule lives in should_scan_file() in scripts/remove-watermarks.py; change SKIP_IPHONE_MIN if the supplier's policy changes.

python3 scripts/remove-watermarks.py scan                    # medium preset (default)
python3 scripts/remove-watermarks.py scan --preset high      # most accurate, slowest
python3 scripts/remove-watermarks.py scan --preset fast      # quickest, may miss faint watermarks
python3 scripts/remove-watermarks.py scan --assets /path/to/content/products/assets  # custom vault

Speed vs accuracy presets (rate is per worker; default uses 10 workers):

Preset	Downscale	Scales	Expected rate	17K images	Notes
`high`	1200px	11	~3 img/s	~95 min	Most accurate — final production pass
`medium` (default)	600px	6	~10 img/s	~28 min	Balanced — recommended for day-to-day
`fast`	300px	3	~25 img/s	~11 min	Quick triage
`super`	200px	2	~60 img/s	~5 min	Very fast — for re-scans
`lightning`	100px	2	~120 img/s	~2 min	Extreme — significant recall loss

All presets disable MSER (caused false positives on clutter) and run a full-resolution verification on every candidate (rejects scores < 0.42 at native scale).

Multi-template matching: in addition to the synthesized "Helvetica" template, the scanner loads any scripts/watermark-templates/real-*.png crops (real watermark patches extracted from product images). Best score across all templates wins, which improves recognition of fonts/styles the synthesized version misses.

Outputs watermarked-images.json with detected files and coordinates.

Full scan filter — which files are checked:

Category	Scanned?	Examples
iPhone 13 / 12 / 11 / XS / XR / X / SE	Yes	`for-iphone-13-pro-...`, `for-iphone-x-...`, `for-iphone-se-...`
iPhone 14 / 15 / 16 / 17	Skipped (excluded)	`for-iphone-14-pro-...`, `for-iphone-15-pro-max-...`, `for-iphone-16-...`, `for-iphone-17-...`
Combo products (e.g. fits 12 + 13 + 14)	Yes	Any filename containing an iPhone 13 or earlier token
iPad / Mac / Apple Watch / accessories	Yes	`for-ipad-mini-...`, `for-macbook-pro-...`, `for-apple-watch-...`

Step 2 — Remove watermarks

Three cleaning modes available:

Mode	Speed	Quality	Model needed	Best for
LaMA + GPU (MPS)	~5-10 img/s	Best	Yes (196 MB)	Final production cleanup
LaMA + CPU	~1 img/s	Best	Yes (196 MB)	Machines without GPU
OpenCV	~50+ img/s	Good	No	Quick batch runs

python3 scripts/remove-watermarks.py clean -f watermarked-images.json                # LaMA (auto-detects MPS GPU, falls back to CPU)
python3 scripts/remove-watermarks.py clean -f watermarked-images.json --method opencv # fastest, no model needed
python3 scripts/remove-watermarks.py clean --dry-run -f watermarked-images.json       # preview only
python3 scripts/remove-watermarks.py clean --assets /path/to/content/products/assets -f watermarked-images.json  # custom vault

LaMA automatically uses Apple Silicon GPU (MPS) when available, falling back to CPU. Originals are backed up to /tmp/watermark-backup/ before modification.

For a single image with a custom mask:

python3 scripts/run_lama.py --image input.jpg --mask mask.png --output clean.jpg

Step 3 — Commit and push

git add content/products/assets/
git commit -m "fix: remove watermarks from product images"
git push origin main

Step 4 — CI handles the rest

git push triggers content-sync.yml which runs automatically:

validate (entity ID lint)
  -> images (WebP conversion, R2 upload)
  -> build (graph, llms.txt, JSON-LD)
    -> sync (D1 chunked batch)
      -> embeddings (Vectorize upsert)

No manual R2/D1 steps needed — CI converts cleaned images to WebP, uploads to R2, and syncs product data to D1.

Content Sync Rules

Markdown sync (content:sync) respects D1 runtime state:

Field	Rule
`stock_qty`	Never overwritten by sync
`price_usd`, `bulk_price_usd`, `bulk_qty`	Preserved if `price_locked = 1`; updated from markdown if `price_locked = 0`
`active`	Preserved unless markdown explicitly sets `active: true/false`
`vision_description`	Preserved if not null; overwritten only when new value exists

Flags:

--force-price -- Override price_locked and update prices from markdown
--deactivate-missing -- Set active = 0 for products not in markdown
--remote -- Execute against remote D1

Chunking

D1 sync splits statements into chunks of 50 to avoid Wrangler payload limits. Each chunk is executed separately. If any chunk fails, the script exits non-zero immediately.

Product Delete Commands

Soft-delete sets active: false in frontmatter and D1. The .md file stays on disk (visible in Obsidian). Add --purge to also remove the file.

Delete a single product

# Soft-delete (keeps .md file, sets active: false)
npm run product:delete -- --slug PRODUCT-SLUG --remote

# Hard-delete (removes .md file + deactivates in D1)
npm run product:delete -- --slug PRODUCT-SLUG --remote --purge

# With optional reason
npm run product:delete -- --slug PRODUCT-SLUG --remote --purge --reason "Discontinued"

Delete multiple products

# Comma-separated slugs
npm run product:batch-delete -- --slugs slug-one,slug-two,slug-three --remote

# With purge (removes all .md files)
npm run product:batch-delete -- --slugs slug-one,slug-two --remote --purge

# From a text file (one slug per line, # lines ignored)
npm run product:batch-delete -- --file scripts/to-delete.txt --remote --purge

to-delete.txt format:

# Products to remove - May 2026
samsung-galaxy-s25-ultra-clear-case
samsung-galaxy-s25-ultra-screen-protector

Both commands issue a single targeted D1 UPDATE — no full catalog sync. Safe to run at any time without risking auth timeouts.

Obsidian Authoring

The repository root is the Obsidian vault. Content lives under content/. Obsidian is the CMS.

Publishing Flow

Obsidian edit -> git commit -> git push -> CI validates and syncs

CI does NOT auto-commit. Missing IDs fail CI with a clear error message. ID generation and fixes happen locally via Obsidian Templater or npm run validate:ids --fix.

Shared Repo Safety Rules

Open Obsidian at the repository root, not at content/ alone.
Obsidian Git must not auto-commit or auto-push on a timer or file change.
Pull before starting work: git pull --ff-only
Review git status before every commit.
Content commits should normally include only content/, attachments/, or selected public/ assets.

Required Product Fields

Every product markdown must have: id, title, slug, sku, category, moq, price_usd.

WikiLinks

[[TPU]], [[CE]], [[iPhone 16 Pro Max]] -- resolved during sync using:

Generated link index (generated/link-index.json) from graph build
Hardcoded material/certification slug sets (fallback)
Device prefix detection
Default: /wholesale/{slug}

Image Embeds

Obsidian image embeds are processed by CI:

![[ip16-housing-ga.jpg]]
![[ iPhone 16 Glass .JPG ]]
![[product photo 01.jpeg]]

Parser supports: global matching, case-insensitive extensions, spaces in filenames, .jpg/.jpeg/.png/.webp.

Images are converted to WebP and uploaded to R2 with deterministic keys: products/{slug}/{hash}.webp.

Watermark Removal

pip3 install -r scripts/requirements-watermark.txt

Scan all product images for watermark candidates:

python3 scripts/remove-watermarks.py scan

Review the report, then clean:

python3 scripts/remove-watermarks.py clean -f watermarked-images.json
python3 scripts/remove-watermarks.py clean --dry-run -f watermarked-images.json  # preview only

Uses LaMA inpainting if model is available at /tmp/big-lama-model.pt, falls back to OpenCV Navier-Stokes. Originals are backed up to /tmp/watermark-backup/. After cleaning, git push triggers CI (validate → images → R2 → D1 sync).

Security

Network Defense

CORS allowlist (configurable via ALLOWED_ORIGINS)
CSRF Origin/Referer validation on mutations
Rate limiting on login, register, search, quotes, orders, catalog
Security headers (HSTS, CSP, X-Frame-Options, X-Content-Type-Options)
Private wholesale catalog: Bearer auth only (no query-token)

Content Safety

HTML sanitization on all rendered markdown content
Blocks: script, iframe, object, event handlers, javascript: URLs, dangerous HTMX attributes
Sanitization applied after markdown-to-HTML conversion and before rendering

Authentication

JWT (HS256) with server-side session tracking in D1
PBKDF2 (100k iterations, SHA-256) password hashing
HttpOnly, Secure, SameSite=Lax cookies
Session revocation support

Crawler & AI Endpoints

llms.txt Hierarchy

3-layer retrieval for AI agents:

/llms.txt -- Index map (noindex)
/llms-full.txt -- Compressed global catalog (noindex)
/llms-ctx-{slug}.txt -- Entity-level deep context (noindex)

Other Endpoints

/ai-index.json -- Machine-readable product index (noindex)
/api/knowledge/{slug} -- Markdown export (noindex)
/api/graph -- Knowledge graph (nodes, edges, summary)
/robots.txt -- Crawl rules with AI retrieval comments
/sitemap.xml -- Canonical URLs only (no AI/noindex routes)

All AI-only endpoints return X-Robots-Tag: noindex to prevent Google from indexing duplicate content.

API Reference

Auth

POST /api/auth/register -- Customer registration (rate limited)
POST /api/auth/login -- JWT login (rate limited)
POST /api/auth/logout -- Logout

Products

GET /api/products -- List (query: category, search, page, limit)
GET /api/products/:id -- Detail
POST /api/products -- Create (admin)
PUT /api/products/:id -- Update (admin)
DELETE /api/products/:id -- Soft delete (admin)
PUT /api/products/:id/live -- Update live price/stock (admin)

Search

GET /api/search?q=&mode=hybrid -- Hybrid search (rate limited)
- Priority: exact SKU -> alias -> FTS5 -> vector similarity
- Modes: hybrid (default), semantic, keyword

Knowledge

GET /api/knowledge/:slug -- Markdown export
GET /api/graph -- Full knowledge graph JSON
GET /api/llms/wholesale-catalog -- Bearer-auth private catalog (rate limited)

Orders & Quotes

POST /api/orders -- Create order (auth, rate limited)
GET /api/orders -- List orders (auth)
POST /api/quotes -- Request quote (auth, rate limited)
GET /api/quotes -- List quotes (auth)

Cart

GET /api/cart -- Get cart (cookie-based)
POST /api/cart/add -- Add to cart
PUT /api/cart/update -- Update quantity
DELETE /api/cart/clear -- Clear cart

Disaster Recovery

npm run backup       # Export D1 + R2 to backups/

Backups fail hard on any error. Recovery target: 0-24 hours.

Fresh Local Validation

npm install
npx tsc --noEmit
npm run validate:ids
npm run db:init
npm run db:seed

Do not run npm run db:migrate:role after db:init; fresh schema already includes customers.role.

Existing Database Migration

For databases created before customers.role existed:

npm run backup:d1
npm run db:migrate:role

See docs/LOCAL_ADMIN_SMOKE_TEST.md for creating a local admin user.

Development

npm run dev          # Start local dev server
npx tsc --noEmit    # Type check
npm run validate:ids # Validate content
npm run deploy       # Deploy to Cloudflare Workers

Project Structure

b2bweb/
  content/               # Obsidian vault (source of truth)
    products/            # Product markdown + images
    categories/
    materials/
    certifications/
    devices/
    entities/            # Suppliers, workflows, tools
  generated/             # Build artifacts (gitignored)
  scripts/               # Content pipeline
    process-images.ts    # Image WebP/R2 pipeline
    sync-content.ts      # Chunked D1 sync
    validate-ids.ts      # Content linter
    generate-graph.ts    # Knowledge graph
    backup-d1.ts         # D1 backup
    backup-r2.ts         # R2 backup
    remove-watermarks.py # Batch watermark scan + clean
    run_lama.py          # LaMA inpainting (single image)
  src/
    index.ts             # App entry + middleware stack
    types.ts             # Type definitions
    routes/
      api.ts             # REST API (rate limited)
      pages.tsx          # SSR pages + SEO
    middleware/
      auth.ts            # JWT + sessions + PBKDF2
      csrf.ts            # CSRF protection
      rate-limit.ts      # Rate limiting
      sanitize.ts        # HTML sanitization
      security-headers.ts
      ai-crawler.ts      # Bot detection
    lib/
      wikilink.ts        # WikiLink resolver
      graph/             # Knowledge graph
      resolver/          # pSEO page resolver
      seo/schema.ts      # JSON-LD builders
    pages/               # JSX page components
    components/          # Shared UI components
    db/
      schema.sql         # Complete schema
      migrate-*.sql      # Migrations
      seed.sql           # Sample data

Hide / Show Products by Category

Temporarily hide an entire product category from the live site without deleting anything. Products stay in the Obsidian vault and can be restored with one command.

How It Works

toggle-category.sh hide "iPhone Parts"
        |
        v
  Obsidian markdown files:  active: true → active: false
        |
        v
  git add + commit + push ──────> GitHub Actions CI
                                       |
                                       v
                                  D1 sync + deploy
                                       |
                                       v
                                  Products hidden on hyranger.com

To restore, run show — same flow in reverse.

Commands

cd ~/Documents/github/b2bweb

# List all categories and product counts
./scripts/toggle-category.sh list

# Check status (single or multiple)
./scripts/toggle-category.sh status "iPhone Parts"
./scripts/toggle-category.sh status "iPhone Parts" "iPad Parts" "MacBook Parts"

# Hide one category
./scripts/toggle-category.sh hide "iPhone Parts"

# Hide multiple categories in one command
./scripts/toggle-category.sh hide "Apple Watch Parts" "iPad Parts" "MacBook Parts"

# Restore one or multiple categories
./scripts/toggle-category.sh show "iPad Parts"
./scripts/toggle-category.sh show "Apple Watch Parts" "iPad Parts" "MacBook Parts"

Available Categories

Category	Products
iPhone Parts	2,056
Apple Watch Parts	377
iPad Parts	315
MacBook Parts	189

After Hiding or Showing

The script edits the markdown files but does not commit or push. You decide when to propagate:

# Review what changed
git diff --stat content/products/

# Propagate to live site
git add content/products/
git commit -m "content: hide Apple Watch, iPad, MacBook parts for watermark cleanup"
git push

CI handles the rest — D1 sync updates the database, Worker redeploys, changes go live.

Reverting

To undo a hide, just run show with the same categories:

./scripts/toggle-category.sh show "Apple Watch Parts" "iPad Parts" "MacBook Parts"
git add content/products/
git commit -m "content: restore Apple Watch, iPad, MacBook parts"
git push

Name		Name	Last commit message	Last commit date
Latest commit History 266 Commits
.github		.github
.obsidian		.obsidian
content		content
docs		docs
generated/watermark		generated/watermark
migrations		migrations
public/assets		public/assets
scripts		scripts
src		src
.DS_Store		.DS_Store
.dev.vars.example		.dev.vars.example
.gitignore		.gitignore
B2BWEB_DISASTER_RECOVERY_PLAN.md		B2BWEB_DISASTER_RECOVERY_PLAN.md
B2BWEB_MINIMAL_REFACTOR_EXECUTION_PLAN.md		B2BWEB_MINIMAL_REFACTOR_EXECUTION_PLAN.md
README.md		README.md
RUNBOOK_DISASTER_MODE.md		RUNBOOK_DISASTER_MODE.md
SINGLE_SHARED_REPO_OPERATING_RULES.md		SINGLE_SHARED_REPO_OPERATING_RULES.md
Untitled.md		Untitled.md
file-:::.fileloc		file-:::.fileloc
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
wrangler.toml		wrangler.toml

Folders and files

Latest commit

History

Repository files navigation

B2BWeb

Architecture

Repository & Deployment Architecture

Design Principles

Shared Repository

CI/CD Workflow Split

deploy.yml — triggered on code changes only

content-sync.yml — triggered on content changes only

Why This Matters

Known Limitation (Current State)

Source of Truth

Bindings

Quick Start

Content Pipeline

Pipeline Commands

Product Upload CLI

Setup

API Key

Usage

Usage with Ollama (Local LLM)

Review in Obsidian

Push Automatically

CLI Flags

Batch Product Upload CLI v10 — Beginner Tutorial

What This Does

Design Concept

Why It Works This Way

Pipeline Architecture

File Layout After a Run

Prerequisites

One-Time Setup

Step-by-Step Guide

Step 1: Choose Your Target

Step 2: Set Environment Variables

Step 3: Run the Intake

Step 4: Review Drafts in Obsidian

Step 5: Approve Good Drafts

Step 6: Promote Approved Drafts

Step 7: Validate

Step 8: Commit and Push

Optional: Advanced Configuration

Custom URL Pattern

Detail-Mode Filtering

Overwriting Existing Products

Verbose Output

Troubleshooting

"lock_held" Error

Zero Candidates Found

All Products Skipped as Duplicates

Validation Failures

Products Missing After Promotion

Quick Reference

Single Product Upload — Terminal-Only Tutorial (CLI Manual)

How It Works

One-Time Setup

Step-by-Step Commands

Step 1 — Find a product URL

Step 2 — Run the scraper

Step 3 — Review the generated file

Step 4 — Edit metadata if needed

Step 5 — Publish the draft

Step 6 — Validate

Step 7 — Commit and push

CLI Flag Reference

Batch Product Upload — Terminal-Only Tutorial

How It Works

Design Principles

Directory Layout

One-Time Setup

Step-by-Step Commands

Step 1 — Set your target and keywords

Step 2 — Run the intake

Step 3 — See what was generated

Step 4 — Review a specific draft

Step 5 — Approve drafts

Step 5b — (Optional) Edit metadata before promoting

`deploy.yml` — triggered on code changes only

`content-sync.yml` — triggered on content changes only

Packages