Intelligent SRE incident intake and triage agent for e-commerce applications, featuring automated classification, codebase-aware root cause analysis, ticket management, team notifications, and full observability.
Built for the AgentX Hackathon
[Reporter]
|
v (1) Submits incident report (text + optional screenshot)
+---------------------------------------------------+
| Frontend - Next.js |
| Client-side validation (Zod + vard) |
+---------------------------------------------------+
|
v (2) Automated triage (background, 10-20 seconds)
+---------------------------------------------------+
| SRE Triage Agent - FastAPI + LangGraph |
| . Prompt injection check (LLM security gate) |
| . Classifies: category, priority, severity, team |
| . Searches Reaction Commerce codebase |
| . Generates technical triage report |
+---------------------------------------------------+
|
|---> (3) Creates ticket in Peppermint
|
|---> (4) Notifies engineering team (Discord + Email via Apprise)
|
v (5) Ticket resolved -> notifies original reporter
[Reporter] <-----------------------------------------
+-----------------+ +---------------------------+ +---------------+
| Frontend |---->| Backend |---->| PostgreSQL |
| Next.js 15 | | FastAPI + LangGraph | +---------------+
| React 19 | | LangChain |
| Tailwind CSS | +---------------------------+
| shadcn/ui | |
+-----------------+ +----------+-----------+
v v v
+----------+ +----------+ +-----------+
|Peppermint| | Apprise | | Phoenix |
| (tickets)| | (Discord/| | (OTEL |
| :3001 | | Email) | | traces) |
+----------+ +----------+ | :6006 |
+-----------+
| Layer | Technology |
|---|---|
| Frontend | Next.js 15, React 19, TypeScript, Tailwind CSS, shadcn/ui, Zod |
| Backend | Python 3.12, FastAPI, LangGraph, LangChain, SQLAlchemy (async) |
| Database | PostgreSQL 16 |
| LLM | OpenAI, Anthropic, Google Gemini, OpenRouter, AI Gateway (configurable) |
| Observability | Arize Phoenix (OpenTelemetry), OpenInference LangChain instrumentation |
| Ticketing | Peppermint (open source) |
| Notifications | Apprise (Discord + email) |
| Security | Dual-layer prompt injection defense (vard client-side + LLM classifier server-side) |
| E-commerce repo | Reaction Commerce (Node.js, indexed at Docker build time) |
| Infrastructure | Docker Compose |
- Docker & Docker Compose
- An API key for at least one LLM provider
git clone https://github.com/andresscode/agentx-hackathon-incident-agent.git
cd agentx-hackathon-incident-agent
cp .env.example .envEdit .env and set your LLM provider and API key:
LLM_PROVIDER=openrouter # openai | anthropic | google | openrouter | aigateway
OPENROUTER_API_KEY=sk-or-... # only the key for your chosen provider is requireddocker compose up --buildFirst build takes ~2 minutes (clones and indexes the Reaction Commerce codebase).
| Service | URL | Credentials |
|---|---|---|
| Incident Form | http://localhost:3000 | -- |
| Phoenix Traces | http://localhost:6006 | -- |
| Peppermint Tickets | http://localhost:3001 | admin@admin.com / 1234 |
See
QUICKGUIDE.mdfor a step-by-step evaluator walkthrough.
cd backend
cp .env.example .env # configure LLM provider + API key
docker compose up -d # start PostgreSQL + Phoenix
uv sync # install dependencies
make dev # start FastAPI with hot reload (localhost:8000)Make targets: make dev | make lint | make typecheck
cd frontend
cp .env.example .env # set BACKEND_URL=http://localhost:8000
pnpm install
pnpm dev # start Next.js dev server (localhost:3000)| Method | Path | Description |
|---|---|---|
| POST | /api/incidents |
Create incident (multipart: name, email, description, image?) |
| GET | /api/incidents/{id} |
Get incident with triage results |
| GET | /api/incidents |
List all incidents (newest first) |
| GET | /api/health |
Health check |
- Multimodal input -- text descriptions + image/screenshot uploads (PNG, JPEG, GIF, WebP)
- AI-powered triage -- automatic classification by category, priority, severity (1-10), and team routing
- Codebase-aware -- searches Reaction Commerce source code for relevant files, includes actual code snippets in reports
- Structured LLM output -- Pydantic models with constrained enums ensure consistent, parseable results
- Prompt injection defense -- dual-layer: client-side heuristic (vard) + server-side LLM classifier
- Pluggable hooks -- add integrations without modifying the triage pipeline
- Multi-provider LLM -- switch between 5 providers via a single environment variable
- Full observability -- Arize Phoenix traces every LLM call with tokens, latency, and prompt content
.
├── frontend/ # Next.js 15 application
│ ├── src/app/ # Pages (incident form, success)
│ ├── src/components/ # UI components (incident-form, shadcn)
│ └── src/lib/ # Schemas, services, hooks, types
├── backend/ # FastAPI application
│ ├── app/
│ │ ├── routes/ # API endpoints
│ │ ├── services/ # Business logic
│ │ ├── workflows/ # LangGraph triage pipeline + hooks
│ │ ├── schemas/ # Pydantic models for LLM structured output
│ │ ├── tools/ # Codebase search index
│ │ ├── models.py # SQLAlchemy Incident model
│ │ ├── prompts.py # All LLM system prompts
│ │ └── llm_provider.py # Multi-provider LLM abstraction
│ └── scripts/ # Codebase indexing script
├── infrastructure/ # Standalone docker-compose files
├── docker-compose.yml # Full-stack deployment
├── AGENTS_USE.md # Agent documentation (hackathon template)
├── SCALING.md # Scalability analysis
└── QUICKGUIDE.md # Evaluator quick guide
- Prompt injection protection: dual-layer defense -- client-side heuristic detection (vard) + server-side LLM classification with structured output
- Safe input handling: Zod + Pydantic validation at every boundary, file type/size restrictions
- Tool safety: agent has read-only access to codebase index, write access limited to ticket creation and notifications
- Data handling: API keys in environment variables wrapped in
SecretStr, unsafe input never logged or exposed in responses - Transparency: every agent decision is traceable via Arize Phoenix
| Stage | What is traced |
|---|---|
| Ingestion | Input received, prompt injection verdict, incident ID |
| Classification | Category, priority, severity, team, reasoning, tokens, latency |
| Codebase Search | Keywords, matched files, LLM file selection |
| Summary | Full triage report, tokens, latency |
| Hooks | Ticket creation, notification dispatch, success/failure |
All LLM calls are auto-instrumented via OpenInference + Phoenix OTEL.
main-- stable branchdevelop-- feature integrationfeature/<name>-- one branch per task, created fromdevelop
| Handle | Role |
|---|---|
@joedoe6179 |
Team Lead / Full Stack |
@ouroboroz333 |
Backend / AI / Infrastructure |
@_kindalikedeus |
Architecture / Scaling |
@happier_helmut |
Documentation / Backend |
- AGENTS_USE.md -- Agent implementation documentation (9-section hackathon template)
- SCALING.md -- Scalability analysis and production roadmap
- QUICKGUIDE.md -- Quick start for evaluators