βthis shouldnβt need to existβ¦ but reality currently requires itβ
A local-first Chrome extension for extracting and exporting ChatGPT conversations into stable canonical snapshots.
Built from the simple desire to βjust save my chats quickly,β then progressively forced into becoming a hydration-aware semantic extraction system after repeated encounters with a shape-shifting frontend runtime.
Rather than cloning the ChatGPT UI directly, the extension extracts and normalizes semantic conversation content into durable canonical representations that can later be exported deterministically as Markdown, HTML, JSON, and future formats.
Current version: 1.0.2
Most existing ChatGPT export approaches fall into one of two categories:
- raw DOM dumping
- screenshot/PDF capture
Both approaches tend to break over time, produce unstable output, or tightly couple exports to transient frontend implementation details.
This project instead focuses on:
- semantic preservation
- deterministic exports
- incremental persistence
- local-first storage
- resilience against frontend mutation
- reproducible archival output
The goal is not to reproduce the ChatGPT interface pixel-for-pixel.
The goal is to preserve conversations as durable structured documents.
- Incremental conversation extraction
- Hydration-aware extraction pipeline
- Stable canonical local snapshots
- Deterministic HTML export
- Deterministic Markdown export
- JSON export
- Full database dump export
- Canonical image and gallery reconstruction
- Incremental IndexedDB persistence
- Safe re-extraction and update handling
- Local-first architecture
git clone https://github.com/world-wide-dev/chatgpt-export.gitDownload and extract the project ZIP locally.
- Open
chrome://extensions - Enable Developer mode
- Click Load unpacked
- Select the
chatgpt-exportdirectory
- Open a ChatGPT conversation
- Click Extract Conversation
- Wait for extraction to complete
- Export using one of the available formats:
- Markdown
- HTML
- JSON
- Full database dump
Exports operate exclusively on the latest successfully extracted local snapshot of the conversation.
Re-running extraction incrementally updates the locally stored canonical snapshot.
Extraction progressively traverses a hydrated DOM snapshot and converts the conversation into stable canonical objects stored locally in the extension database.
Exporters operate exclusively on the latest successfully extracted snapshot, making exports resilient against frontend runtime and hydration changes.
The architecture intentionally separates:
Extraction β Persistence β Export
This separation allows exports to remain deterministic and reproducible without depending directly on the live ChatGPT runtime during export generation.
ChatGPT Runtime
β
Hydration-Aware Extraction
β
Semantic Normalization
β
IndexedDB Persistence
β
Deterministic Export Layer
β
Markdown / HTML / JSON Output
| Layer | Responsibility |
|---|---|
| Extraction | Read and normalize semantic content from the runtime DOM |
| Persistence | Store canonical conversation/message representations |
| Transformation | Generate deterministic export formats |
| UI | Browse and manage archived conversations |
| Export | Produce Markdown, HTML, JSON, and DB snapshots |
No backend. No server dependency. No external storage.
All extraction, persistence, and export generation happen locally inside the browser.
The ChatGPT UI is treated as a transient rendering layer.
The extension extracts semantic content:
- paragraphs
- lists
- code blocks
- tables
- images
- blockquotes
- headings
while intentionally removing:
- layout wrappers
- action bars
- runtime-specific UI containers
- styling artifacts
- interaction controls
The resulting output is stable, portable, and independent from the original frontend structure.
The system stores normalized canonical conversation snapshots locally in IndexedDB.
Exporters never scrape the live DOM directly.
Instead:
extract β normalize β persist β export
This architecture makes exports:
- reproducible
- deterministic
- resilient against runtime mutation
- independent from hydration timing
- safe to regenerate later
Markdown, HTML, and future export formats are therefore treated as derived representations generated from canonical stored data.
Extraction is intentionally safe to re-run.
Messages are persisted individually using stable platform-native identifiers extracted directly from the ChatGPT runtime.
This enables:
- incremental updates
- partial recovery
- safe refresh handling
- deterministic ordering
- duplicate prevention
The system favors correctness and resilience over aggressive optimization.
Modern frontend runtimes increasingly virtualize large conversations.
This means visible UI content is not always guaranteed to exist persistently in the DOM.
To remain reliable under virtualization, the exporter performs localized hydration-aware extraction by:
- progressively traversing conversation regions
- waiting for runtime hydration
- extracting semantic content immediately
- persisting normalized output incrementally
The resulting system behaves more like a streaming archival pipeline than a static DOM scraper.
pul,ol,lipre,codetableblockquotestrong,emaimgh1βh4hr
- runtime wrappers
- action bars
- copy/share controls
- layout containers
- transient UI elements
- styling-specific classes
All code blocks are normalized into deterministic semantic structures.
Language extraction is preserved whenever available.
The export layer then derives:
```language
code
```from canonical semantic representations.
Images are extracted independently from runtime wrappers and reinjected into canonical semantic structures during normalization.
This avoids runtime-specific layout instability while preserving:
- image ordering
- gallery grouping
- deterministic rendering
- portable export structure
The extraction layer intentionally avoids relying on transient frontend layout wrappers whenever possible.
The extension uses IndexedDB with dedicated stores for:
- conversations
- messages
- images
Messages are stored individually rather than as conversation-sized blobs.
This enables:
- incremental persistence
- partial recovery
- deterministic updates
- scalable handling of long conversations
{
"id": "69f79886-ac48-8328-9d3a-98fa285bce9f",
"title": "conversation-title",
"model": "gpt-5-3",
"first_seen_at": 1778791573306,
"updated_at": 1778791575269,
"last_message_id": "eac90e6a-6def-4251-9697-aef09507cd3a",
"extracting": false
}
{
"id": "eac90e6a-6def-4251-9697-aef09507cd3a",
"conversation_id": "69f79886-ac48-8328-9d3a-98fa285bce9f",
"index": 10,
"role": "assistant",
"model": "gpt-5-3",
"content_html": "<p>...</p>",
"image_ids": []
}
Exports are generated on demand from canonical stored representations.
The export process:
User triggers extraction
β
Conversation hydrates progressively
β
Semantic content is normalized
β
IndexedDB updates incrementally
β
Deterministic export generated
β
Markdown / HTML / JSON snapshot downloaded
The resulting exports include:
- metadata manifests
- semantic code blocks
- preserved image galleries
- canonical ordering
- stable formatting
The HTML export intentionally behaves like a portable archival document rather than a UI replay.
Features include:
- deterministic structure
- bounded image rendering
- semantic typography
- portable CSS
- export metadata
- stable printability
The export prioritizes readability and durability over visual fidelity to the live ChatGPT interface.
- semantic HTML over DOM snapshots
- local-first persistence over backend infrastructure
- deterministic exports over runtime replay
- explicit extraction flow over hidden background automation
- incremental persistence over full-conversation rewrites
- canonical representation over duplicated transformation layers
- PDF-first architecture
- screenshot-based capture
- visual DOM cloning
- backend synchronization complexity
- state-heavy frontend orchestration
- fragile UI-coupled rendering assumptions
- semantic conversation extraction
- hydration-aware traversal
- incremental IndexedDB persistence
- deterministic Markdown export
- deterministic HTML export
- JSON export
- full database dump export
- code block normalization
- language-aware fenced code blocks
- image preservation and gallery reconstruction
- conversation metadata manifests
- idempotent extraction pipeline
Potential future additions:
- syntax highlighting during HTML rendering
- full-text search
- bulk export
- optional external sync
- additional export renderers
- extraction interruption / resume support
- per-conversation extraction status indicators
The project intentionally avoids uncontrolled feature growth in favor of maintaining a stable archival core.
Built while actively co-engineering extraction logic with ChatGPT itself - which, in retrospect, feels appropriately recursive.
Special thanks to the OpenAI frontend team for repeatedly evolving the ChatGPT runtime during development, transforming a straightforward exporter into a hydration-aware semantic archival system.
This project is not affiliated with OpenAI.
Frontend runtime structures may change over time, requiring extractor updates.
This project focuses on long-term conversation durability rather than short-term UI mirroring.
The resulting system behaves less like a browser scraper and more like a semantic archival pipeline:
hydrate
β normalize
β persist
β regenerate deterministically
The architecture intentionally prioritizes:
- resilience
- reproducibility
- semantic clarity
- portability
- maintainability
over visual cloning or frontend-specific assumptions.
Ultimately, the project exists because reliable conversation preservation should not depend on transient frontend state.
MIT License β see LICENSE file for details.
Copyright Β© 2026 Peter Karpati (world-wide-dev)