Script that:
- Reads a list of sources from a file (each line:
type url— type isrssorhtml) - Parses the latest N items: RSS/Atom with feedparser, HTML blog listing pages with BeautifulSoup + readability
- Calls an LLM to generate 10 post ideas from the parsed content (structured: source links, source insight, post idea, description, format, how to use)
Uses OpenAI for LLM calls and LangFuse for tracing (via langfuse.openai).
See docs/RUNNING.md for full setup, environment variables, pipeline steps, and troubleshooting.
uv sync
cp .env.example .env # set OPENAI_API_KEY
python run.py # writes ideas.mdConfigure sources in urls.txt (rss or html per line). For new HTML sources, use docs/ADDING_SOURCES.md.
| Doc | Purpose |
|---|---|
| docs/RUNNING.md | Install, env, run pipeline, output, troubleshooting |
| docs/ADDING_SOURCES.md | Adding and validating new RSS/HTML sources |
content-engine/
├── run.py # Entry point (loads .env, calls main)
├── main.py # Pipeline: load sources → fetch → LLM → output
├── config.py # Constants (DEFAULT_*, USER_AGENT, SOURCE_TYPES)
├── models.py # FeedEntry, Source dataclasses
├── sources.py # load_sources() from urls file
├── fetcher.py # fetch_entries() — RSS + HTML, merge by date
├── parsers/
│ ├── __init__.py
│ ├── rss.py # fetch_entries_rss()
│ └── html.py # fetch_entries_html()
├── prompt_loader.py # load_prompt(name, **variables)
├── llm.py # generate_post_ideas()
├── prompts/ # Prompt templates ({{placeholder}})
│ ├── post_ideas_system.txt
│ └── post_ideas_user.txt # {{count}}, {{content}}, {{sources}}
├── urls.txt
└── ...
Run from the project root so that imports resolve (python run.py).