Autonomous AI agents that batch analyze legal contracts for missing provisions, unusual terms, and risk flags - powered by AWS Strands Agents SDK and Amazon Bedrock AgentCore.
π§ Status: Core agents stable Β· Dashboard live Β· AgentCore deployment ready
M&A and financing due diligence requires reviewing 10β100+ contracts to identify missing clauses, non-standard terms, and cross-document inconsistencies. Manual review is slow, expensive, and error-prone.
Factor AI deploys a system of autonomous AI agents that collaboratively analyze batches of legal documents:
- Ingest PDFs and DOCX files, extracting and chunking provisions
- Detect provision types using pattern matching and AI classification
- Score risk levels against configurable rubrics
- Identify missing critical clauses via gap analysis
- Compare provisions across documents for inconsistencies
- Generate structured risk reports with Excel and HTML export
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FACTOR AGENT SYSTEM β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Coordinator Agent β β
β β Receives document batch, plans analysis strategy, β β
β β delegates to specialist agents, assembles report β β
β ββββββββββββ¬βββββββββββ¬βββββββββββ¬ββββββββββββββββββ β
β β β β β
β ββββββββββΌβββ βββββββΌββββββ βββΌβββββββββββ β
β β Ingestion β β Analysis β β Knowledge β β
β β Agent β β Agent β β Agent β β
β β β’ Parse β β β’ Detect β β β’ RAG β β
β β β’ Chunk β β β’ Score β β β’ Classify β β
β β β’ Extract β β β’ Gaps β β β’ Citationsβ β
β βββββββββββββ β β’ Compare β ββββββββββββββ β
β βββββββββββββ β
β βββββββββββββββ β
β β Reporting β β
β β Agent β β
β β β’ Reports β β
β β β’ Excel β β
β β β’ HTML β β
β βββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Bedrock AgentCore Runtime β Memory β Gateway β β
β β Policy β Observability β Identity β β
β β Amazon Bedrock (Foundation Models) β β
β β S3 β DynamoDB β CloudWatch β Cognito β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Agent | Role | Tools | Status |
|---|---|---|---|
| π― Coordinator | Orchestrates pipeline, delegates tasks, assembles results | ingest_documents, analyze_provisions, search_knowledge, generate_report |
β Stable |
| π Ingestion | Parses PDF/DOCX, chunks into provisions | parse_pdf, parse_docx, chunk_provisions |
β Stable |
| π Analysis | Detects types, scores risk, finds gaps, compares | detect_provision_type, score_risk, find_gaps, compare_across_documents |
β Stable |
| π Knowledge | Searches synthetic KB, classifies domains, extracts citations | search_synthetic_knowledge, classify_domain, extract_citations |
β Stable |
| π Reporting | Builds reports, exports Excel/HTML | build_risk_report, export_excel, export_html |
β Stable |
- π Provision Detection - 14 provision types identified via anchor patterns
- π Risk Scoring - Configurable rubrics with weighted signals (0β10 scale)
β οΈ Gap Analysis - Standard checklists for NDAs, leases, loans, mergers, employment, license, and supply agreements- π Cross-Document Comparison - Inconsistency detection across governing law, liability caps, termination terms
- π RAG Knowledge Search - Synthetic legal knowledge base (Taylor658/synthetic-legal)
- π Structured Reports - Executive summary, risk assessment, gap analysis, comparison results
- π₯ Export - Excel (with disclaimer tab) and HTML (with disclaimers on every page)
- β‘ SSE Streaming - Real-time analysis progress via Server-Sent Events
- π‘οΈ Session Isolation - Cedar policies enforce per-user data access
- π Upload Validation - File type enforcement (PDF, DOCX, DOC, TXT) with size limits
- π Production CORS - Configurable origin restrictions for production deployments
- π§Ή Automatic Cleanup - Uploaded files are removed after analysis completes
factor/
βββ src/factor/ # Python backend
β βββ agents/ # Strands Agent definitions
β βββ tools/ # @tool decorated functions
β βββ knowledge/ # ChromaDB vector store + dataset loader
β βββ models/ # Pydantic data models
β βββ aws/ # Bedrock, AgentCore, S3, Cognito
β βββ reporting/ # HTML report templates
β βββ db/ # Session store (thread-safe)
β βββ app.py # FastAPI + SSE streaming
β βββ config.py # pydantic-settings
βββ src/frontend/ # React 18 + TypeScript + Vite
β βββ src/
β βββ components/ # Upload, Analysis, Report, shared
β βββ hooks/ # useUpload, useAnalysis, useAgentStream
β βββ api/ # API client
β βββ types/ # TypeScript types
βββ tests/ # pytest test suite
βββ scripts/ # Seed KB, generate samples, deploy, benchmark
βββ policies/ # Cedar policy files
βββ data/ # Provision definitions, risk rubric, samples
βββ infra/ # AWS CDK stacks
βββ docker/ # API + Frontend Dockerfiles, docker-compose
- β Python 3.11+
- β Node.js 20+
- β AWS account with Bedrock access
- β
AWS CLI configured (
aws configure)
# Clone the repository
git clone https://github.com/ATaylorAerospace/Factor-AI.git
cd Factor-AI
# Create virtual environment
python -m venv .venv && source .venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt
# Seed the knowledge base
python scripts/seed_knowledge_base.py
# Start the API server
uvicorn src.factor.app:app --reload --port 8000cd src/frontend
npm install
npm run devpytest tests/ -v --cov=src/factor| Layer | Technology | Purpose |
|---|---|---|
| π€ Agent Framework | Strands Agents SDK | Model-driven agents with @tool |
| π§ Foundation Model | Amazon Bedrock (Anthropic Sonnet) | Reasoning + tool-use |
| β‘ Agent Runtime | Bedrock AgentCore Runtime | Serverless execution |
| πΎ Agent Memory | Bedrock AgentCore Memory | Persistent context |
| π§ Agent Gateway | Bedrock AgentCore Gateway | MCP tool access |
| π‘οΈ Agent Policy | Bedrock AgentCore Policy (Cedar) | Action boundaries |
| π Observability | Bedrock AgentCore + OTEL | Tracing + dashboards |
| π Identity | Bedrock AgentCore Identity / Cognito | Authentication |
| π’ Embeddings | sentence-transformers | Vector embeddings |
| π Vector Store | ChromaDB (local) / Bedrock KB (prod) | Dataset indexing |
| βοΈ Storage | Amazon S3 | Document storage |
| ποΈ Metadata | Amazon DynamoDB | Session + results |
| π Doc Parsing | PyMuPDF + python-docx + pdfplumber | Text extraction |
| π₯οΈ Frontend | React 18 + TypeScript + Vite + Tailwind | Dashboard |
| π Export | openpyxl + Jinja2 | Reports |
| ποΈ IaC | AWS CDK (Python) | Infrastructure |
| π CI/CD | GitHub Actions | Quality + deploy |
β οΈ CRITICAL: THIS IS A SYNTHETIC DATASET - ALL CONTENT IS ARTIFICIALLY GENERATEDFactor's knowledge base is powered by the Taylor658/synthetic-legal dataset on HuggingFace (140,000 rows, MIT License).
ALL text in this dataset is synthetically generated and IS NOT legally accurate. All citations, statutes, case references, legal problems, verified solutions, and pairings are synthetic constructs created through template-based randomization. No citations, statutes, or case references in this dataset are real.
This dataset exists for research, experimentation, and model training only.
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/analyze |
Upload documents (PDF, DOCX, DOC, TXT) + stream agentic analysis |
GET |
/api/v1/sessions/{id} |
Session status + results |
GET |
/api/v1/sessions/{id}/trace |
Agent reasoning trace |
GET |
/api/v1/reports/{session_id} |
Structured report |
GET |
/api/v1/reports/{session_id}/export |
Download Excel/HTML |
GET |
/api/v1/knowledge/search |
Search synthetic KB |
GET |
/api/v1/knowledge/domains |
List legal domains |
GET |
/api/v1/health |
Health check |
# Run all tests with coverage
pytest tests/ -v --cov=src/factor
# Run specific test modules
pytest tests/test_tools/ -v # Tool tests
pytest tests/test_agents/ -v # Agent tests
pytest tests/test_knowledge/ -v # Knowledge base testsTests cover:
- β
Each
@toolfunction independently with assertions - β Agent creation with mocked Bedrock responses
- β Synthetic dataset loading and metadata
- β ChromaDB vector store operations
- β Provision detection, scoring, and gap analysis
- β Cross-document comparison and inconsistency detection
- β Domain classification across 13 legal domains
- β Citation extraction (cases, statutes, regulations)
- β Report building, Excel export, and HTML export
- β All outputs label synthetic content
# Start both API and frontend services
docker compose -f docker/docker-compose.yml up --build# Deploy infrastructure
cd infra && cdk deploy --all
# Deploy agent configuration
python scripts/deploy_agentcore.py --env productionContributions are welcome! Please see the issue templates for bug reports and feature requests.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT Β© 2026 A Taylor See LICENSE for details.
