Paste your HTML. Visualize the content. Download structured data.
Htmlsift is an interactive web interface designed to extract structured data from HTML documents without requiring any programming knowledge. It visualizes the DOM, allowing you to select elements for extraction and export structured data.
Visit htmlsift to start using it immediately.
- Input: Paste your HTML document into the text area.
- Explore: View the paths through the document.
- Select: Choose the path containing content for extraction.
- Export: Download your data as JSON or CSV.
Prerequisites: R and Python.
Key scaffolding decisions are documented in docs/architecture.md.
| Command | Description |
|---|---|
shiny::runApp("src/shiny") |
Start the Shiny development server |
uv run python -m pytest |
Run Python unit tests |
- Dependency management: R dependencies are managed via
renv; Python dependencies viauv. - Testing: Python unit tests use
doctest; coverage reports are generated usingpytest. Frontend (R Shiny) testing is currently out of scope.