An advanced AI-powered code review system with state-of-the-art prompt engineering designed to assist developers in reviewing and improving their code. The system leverages advanced machine learning techniques, dynamic prompts, and few-shot learning to provide insightful feedback and suggestions.
python3 main_clean.py example2.py --json-details python3 main_clean.py example2.py
- π Automated code quality analysis
- π Security vulnerability detection
- π§ͺ Code smell identification
- π Static analysis with multiple tools (pylint, bandit, flake8, etc.)
- π― AST-based context extraction
- π Hybrid multi-stage deduplication
- Dynamic prompt templates with Jinja2 templating
- Few-shot learning with curated examples
- Chain-of-thought prompting for complex analysis
- Prompt versioning and A/B testing
- Performance tracking and automatic optimization
- Multi-turn conversations for clarification
- Token optimization (30-50% cost reduction)
- Multi-level caching: Memory β Redis β Disk
- Semantic similarity caching for LLM responses
- 5-10x faster re-analysis
- 30-50% reduction in API costs
- Cache warming for common patterns
- Full async/await with asyncio
- Streaming results (don't wait for all agents)
- 3-5x faster analysis
- Work stealing for load balancing
- Batch processing for multiple files
- Git diff-based analysis (only changed lines)
- Issue lifecycle tracking across commits
- Near-instant feedback on changes
- Blame integration (who introduced the issue)
- 90%+ reduction in analysis time for small changes
- Learns from your feedback - Adapts to team preferences
- Personalized priorities - Based on historical acceptance
- Context-aware - Considers file importance, complexity
- Automatic training - Improves over time
- 13 feature model - Smart recommendations
- CVE database integration - National Vulnerability Database
- OWASP Top 10 mapping - Industry standards
- CWE classification - Common Weakness Enumeration
- Dependency scanning - safety + pip-audit
- CVSS scoring - Severity assessment
- Exploit detection - Check if exploits exist
- DSL for custom rules - Easy rule definition
- Rule marketplace - Share and discover rules
- 8 built-in templates - Ready-to-use rules
- Testing framework - Test before deployment
- Team-specific standards - Per-team rule sets
- CLI management - Full command-line interface
- Python - Full AST analysis support
- Java - Full AST analysis with javalang
- JavaScript - Basic support (AST coming soon)
- 40+ languages - Detected and analyzed
- Content-based detection - Works without extensions
- Robust fallback - Smart language identification
- π REST API server
- π₯οΈ CLI interface
- π Multiple output formats (markdown, JSON)
- βοΈ Configurable via YAML
python main_clean.py your_code.py
# Show full JSON details for each issue
python main_clean.py your_code.py --json-details
# With iterations (auto-fix mode)
python3 main.py example.py --max-iterations 5
# Force stop after 3 iterations
python3 main.py sample.py --max-iterations 3 --force-stop# Initialize prompt templates (first time only)
python3 cli/prompt_manager.py init
# Run enhanced analysis
python3 main_with_prompt_engine.py example.py
# View prompt metrics
python3 cli/prompt_manager.py list
# Optimize prompts
python3 cli/prompt_manager.py optimize quality_v1 --apply# Run with all advanced features
python3 main_advanced.py example.py
# Features enabled:
# β
Multi-level intelligent caching (5-10x faster)
# β
Async/parallel pipeline (3-5x faster)
# β
Incremental Git-based analysis
# β
Issue lifecycle tracking
# Test advanced features
python3 test_advanced_features.pySee PROMPT_ENGINEERING_GUIDE.md for complete documentation on:
- Creating custom templates
- A/B testing prompts
- Performance optimization
- Multi-turn conversations
- Best practices
# List all templates
python3 cli/prompt_manager.py list
# View metrics for a template
python3 cli/prompt_manager.py metrics quality_v1
# Get optimization suggestions
python3 cli/prompt_manager.py optimize quality_v1
# Create A/B test
python3 cli/prompt_manager.py ab-test "My Test" quality_v1 quality_v2 --split 0.5
# View A/B test results
python3 cli/prompt_manager.py ab-list- PROMPT_ENGINEERING_GUIDE.md - Prompt engineering framework
- ADVANCED_FEATURES_GUIDE.md - Caching, async, incremental analysis
- SELECTIVE_FIX_GUIDE.md - Manual issue selection & fixing
- ML_AND_SECURITY_GUIDE.md - ML prioritization & security scanning
- CUSTOM_RULES_GUIDE.md - Custom rule engine & DSL
- JAVA_SUPPORT_GUIDE.md - Java language support (NEW!)
- JSON_OUTPUT_GUIDE.md - Full JSON output details (NEW!)
- PROMPT_FRAMEWORK_SUMMARY.md - Implementation summary
Edit config.yaml to customize:
- Analysis thresholds
- Agent timeouts
- Static analysis tools
- Issue weights
- Caching settings (memory, Redis, disk)
- Pipeline settings (concurrency, timeouts)
- Incremental analysis (Git integration)
- Prompt engineering parameters
- Full analysis: ~15-20 seconds
- Re-analysis: ~15-20 seconds (no caching)
- Multiple files: Sequential, ~15s per file
- First analysis: ~2-3 seconds (async pipeline)
- Re-analysis: ~0.2-0.5 seconds (cache hit)
- Incremental: ~0.1-0.3 seconds (changed code only)
- Multiple files: ~2-3 seconds total (batch processing)
- 5-10x faster with caching
- 3-5x faster with async pipeline
- 50-100x faster with incremental analysis
- 90% reduction in API costs
# Clone repository
git clone <repo-url>
cd ai_code_review_system
# Install dependencies
pip install -r requirements.txt
# Optional: Install Redis for distributed caching
pip install redis
# Start Redis: redis-server
# Initialize prompt templates
python3 cli/prompt_manager.py init
# Test installation
python3 test_advanced_features.pyBased on my comprehensive analysis of your AI code review system, here are advanced improvements organized by priority and impact:
Current State: Static prompts in text files
- Enhance:
- Dynamic prompt templates with context injection
- Few-shot learning with curated examples
- Chain-of-thought prompting for complex issues
- Prompt versioning and A/B testing
- Token optimization (reduce costs by 30-50%)
- Add:
- Prompt registry with performance metrics
- Automatic prompt tuning based on feedback
- Multi-turn conversations for clarification
- Impact: Higher accuracy, lower costs, better explanations
Current State: Single model (Gemini 2.0 Flash)
- Implement:
- Model routing (GPT-4 for complex, Claude for security, Gemini for speed)
- Consensus voting for critical issues
- Cost-aware model selection
- Automatic fallback on model failures
- Model performance tracking per issue type
- Impact: Better accuracy, cost optimization, vendor independence
Current State: Deduplication works, but no deeper clustering
- Add:
- Embedding-based issue clustering (sentence-transformers)
- Root cause identification across issues
- Issue impact propagation analysis
- Automated fix prioritization based on dependencies
- Cross-file issue correlation
- Impact: Smarter fix ordering, reduced duplicate work
Current State: Basic file-based cache
- Upgrade to:
- Redis/Memcached for distributed caching
- Multi-level cache (memory β Redis β disk)
- Semantic cache (similar code β similar results)
- Cache warming for common patterns
- TTL based on code volatility
- Cache:
- AST parsing results
- Static analysis per file hash
- LLM responses with semantic similarity
- Deduplication fingerprints
- Impact: 5-10x faster re-analysis, lower API costs
Current State: ThreadPoolExecutor for basic parallelism
- Enhance:
- Full async/await with
asyncio - Streaming analysis results (don't wait for all agents)
- Incremental file analysis (only changed files)
- Batch processing for multiple files
- Work stealing for load balancing
- Full async/await with
- Impact: 3-5x faster analysis, better resource utilization
Current State: Full re-analysis every time
- Add:
- Git diff-based analysis (only changed lines)
- Incremental AST updates
- Persistent issue tracking across commits
- Issue lifecycle management (new/fixed/regressed)
- Blame integration (who introduced the issue)
- Impact: Near-instant feedback on changes
Current State: CLI and API only
- Build:
- LSP server for real-time analysis
- VS Code extension
- JetBrains plugin
- Inline suggestions and quick fixes
- Code actions (auto-fix on save)
- Impact: Shift-left quality, faster feedback loop
Current State: Manual execution
- Add:
- GitHub Actions workflow
- GitLab CI template
- Jenkins plugin
- Pull request comments with analysis
- Quality gates (block merge on critical issues)
- Trend analysis over commits
- Impact: Automated quality enforcement
Current State: Batch refactoring, no preview
- Build:
- Web UI for fix review (React/Vue)
- Side-by-side diff viewer
- Selective fix application
- Undo/redo support
- Fix explanation with examples
- Impact: Safer refactoring, better user trust
Current State: Rule-based priority scoring
- Train:
- ML model on historical fix acceptance
- Learn from user feedback (accepted/rejected fixes)
- Personalized priority based on team preferences
- Context-aware severity adjustment
- Features: Issue type, file history, developer experience, project domain
- Impact: Smarter recommendations, less noise
Current State: Bandit/Semgrep only
- Integrate:
- CVE databases (NVD, Snyk, GitHub Advisory)
- OWASP Top 10 mapping
- CWE classification
- Exploit availability checking
- Dependency vulnerability scanning (Safety, pip-audit)
- Impact: Comprehensive security coverage
Current State: Single-point-in-time score
- Track:
- Quality score trends over time
- Technical debt accumulation
- Issue velocity (new vs fixed)
- Hotspot identification (files with most issues)
- Team/developer quality metrics
- Visualize: Dashboards, reports, badges
- Impact: Data-driven quality improvement
Current State: Python-focused, basic JS/Java
- Add full support for:
- TypeScript, Rust, Go, Kotlin, Swift
- Language-specific best practices
- Framework-specific rules (Django, React, Spring)
- Cross-language analysis (polyglot projects)
- Impact: Broader applicability
Current State: Fixed rules from tools
- Build:
- DSL for custom rules
- Rule marketplace/sharing
- Team-specific coding standards
- Project-specific patterns
- Rule testing framework
- Impact: Tailored to team needs
Current State: No limits
- Add:
- Per-user/project rate limits
- Token budget tracking
- Cost alerts and caps
- Usage analytics
- Quota management
- Impact: Prevent runaway costs
Current State: Single-user
- Build:
- User authentication (OAuth, SSO)
- Team/organization support
- Role-based access control
- Project isolation
- Audit logging
- Impact: Enterprise readiness
Current State: No audit logging
- Add:
- Full audit trail of analyses
- GDPR/SOC2 compliance features
- Data retention policies
- Export capabilities
- Anonymization options
- Impact: Enterprise/regulated industry adoption
- Add retry logic with exponential backoff (2 hours)
- Implement structured logging with correlation IDs (4 hours)
- Add basic unit tests for agents (1 day)
- Create Docker container for deployment (4 hours)
- Add GitHub Actions CI workflow (2 hours)
- Implement semantic caching for LLM responses (1 day)
- Add progress bars for long-running analyses (2 hours)
- Create API documentation with Swagger/OpenAPI (4 hours)
- Add health check endpoint (1 hour)
- Implement graceful shutdown (2 hours)
Phase 1 (Foundation - 2-3 weeks):
- Testing infrastructure (#1)
- Error handling & circuit breakers (#3)
- Observability basics (#2 - logging + basic metrics)
Phase 2 (Intelligence - 3-4 weeks): 4. Advanced prompt engineering (#4) 5. Intelligent caching (#7) 6. Semantic issue clustering (#6)
Phase 3 (Scale - 2-3 weeks): 7. Async pipeline (#8) 8. Incremental analysis (#9) 9. Multi-model ensemble (#5)
Phase 4 (Integration - 4-6 weeks): 10. IDE integration (#10) 11. CI/CD integration (#11) 12. Interactive UI (#12)
Phase 5 (Advanced - ongoing): 13-20. Based on user feedback and business priorities
- Quality: False positive rate, fix acceptance rate
- Performance: Analysis time, cache hit rate, API latency
- Cost: Token usage, API costs per analysis
- Adoption: Daily active users, analyses per day
- Impact: Issues fixed, quality score improvement
Summary: Your system has a solid foundation with AST integration, hybrid deduplication, and multi-agent architecture. The biggest gaps are testing, observability, and production hardening. Focus on Tier 1 first for reliability, then Tier 2 for intelligence improvements.