Skip to content

pega2077/autogen_node

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

autogen_node

A Node.js/TypeScript implementation of microsoft/autogen, providing a framework for building multi-agent AI systems with conversational agents.

Overview

This project brings the powerful multi-agent orchestration capabilities of Microsoft's AutoGen framework to the Node.js ecosystem. It's designed based on the .NET code structure and class definitions, providing a familiar API for developers working with AutoGen in different languages.

Features

  • Event-Driven Architecture (AutoGen v0.4): Asynchronous message passing and distributed agent systems
    • AgentRuntime: Core runtime for hosting and managing agents
    • Direct Messaging: Send messages between agents asynchronously
    • Publish/Subscribe: Topic-based broadcast messaging
    • Cancellation Tokens: Control and cancel async operations
    • State Management: Persist and restore runtime state
  • Base Agent Framework: Core interfaces and abstract classes for building custom agents
  • Multiple LLM Providers: Support for OpenAI, OpenRouter, Ollama, Anthropic, and Google Gemini
    • OpenAI: GPT-3.5, GPT-4, and other OpenAI models
    • Anthropic: Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku
    • Google Gemini: Gemini 1.5 Flash, Gemini 1.5 Pro, and Gemini Pro
    • OpenRouter: Access to 100+ models from multiple providers
    • Ollama: Run LLMs locally for privacy and offline use
  • AssistantAgent: LLM-powered conversational agent with provider flexibility
  • UserProxyAgent: Human-in-the-loop agent for interactive conversations
  • PlannerAgent: Planning agent that breaks down requirements into structured tasks
  • SupervisorAgent: Supervisor agent that verifies task completion and provides feedback
  • Group Chat: Multi-agent collaboration system for complex tasks
  • Function Calling: Register and execute custom functions with agents
  • Code Execution: Automatically execute code generated by agents (JavaScript, Python, Bash)
    • LocalCodeExecutor: Execute code on the local machine
    • DockerCodeExecutor: Execute code safely in isolated Docker containers
  • Memory System: Persistent memory for agents to maintain context across conversations (based on Microsoft AutoGen)
  • Type-Safe: Built with TypeScript for enhanced developer experience
  • Flexible Message System: Support for different message types and roles
  • Conversation Management: Built-in conversation history and state management
  • Advanced Conversation Patterns: Complete implementation of AutoGen patterns
    • Nested Chat: Hierarchical conversations with task delegation
    • Sequential Chat: Predefined workflow execution
    • Speaker Selection: Multiple strategies (Round-robin, Random, Manual, Constrained, Auto/LLM-based)
    • Swarm Mode: Dynamic multi-agent task distribution and collaboration
  • Tools & Extensions: Comprehensive toolset for agent capabilities
    • File System Tools: Safe file read/write and directory operations
    • Browser Tools: Web automation with Playwright (scraping, screenshots, interaction)
    • API Tools: REST and GraphQL API call wrappers
    • Image Generation: DALL-E integration for AI image generation
    • Database Tools: SQL/NoSQL database connection interfaces
    • Tool Caching: Result caching with multiple eviction strategies

Installation

npm install

Quick Start

Using OpenAI (Default)

import { AssistantAgent, UserProxyAgent, HumanInputMode } from './src/index';

// Create an AI assistant
const assistant = new AssistantAgent({
  name: 'assistant',
  provider: 'openai',  // optional, this is the default
  apiKey: process.env.OPENAI_API_KEY!,
  systemMessage: 'You are a helpful assistant.',
  model: 'gpt-3.5-turbo',
  temperature: 0
});

// Create a user proxy for human interaction
const userProxy = new UserProxyAgent({
  name: 'user',
  humanInputMode: HumanInputMode.ALWAYS
});

// Start a conversation
await userProxy.initiateChat(
  assistant,
  'Hello! Can you help me?',
  10 // max rounds
);

Using OpenRouter

const assistant = new AssistantAgent({
  name: 'assistant',
  provider: 'openrouter',
  apiKey: process.env.OPENROUTER_API_KEY!,
  model: 'anthropic/claude-2',
  temperature: 0.7
});

Using Anthropic Claude

const assistant = new AssistantAgent({
  name: 'assistant',
  provider: 'anthropic',
  apiKey: process.env.ANTHROPIC_API_KEY!,
  model: 'claude-3-5-sonnet-20241022',
  temperature: 0.7
});

Using Google Gemini

const assistant = new AssistantAgent({
  name: 'assistant',
  provider: 'gemini',
  apiKey: process.env.GEMINI_API_KEY!,
  model: 'gemini-1.5-flash',
  temperature: 0.7
});

Using Ollama (Local)

const assistant = new AssistantAgent({
  name: 'assistant',
  provider: 'ollama',
  model: 'llama2',
  temperature: 0.7
});

See LLM_PROVIDERS.md for detailed provider documentation.

Project Structure

autogen_node/
├── src/
│   ├── core/                 # Core interfaces and base classes
│   │   ├── IAgent.ts         # Agent interface definitions
│   │   ├── BaseAgent.ts      # Base agent implementation
│   │   ├── IFunctionCall.ts  # Function calling interfaces
│   │   ├── FunctionContract.ts # Function contract builder
│   │   ├── FunctionCallMiddleware.ts # Function execution middleware
│   │   └── ICodeExecutor.ts  # Code execution interface
│   ├── agents/               # Agent implementations
│   │   ├── AssistantAgent.ts # LLM-powered assistant with function calling
│   │   └── UserProxyAgent.ts # Human proxy with code execution
│   ├── executors/            # Code execution implementations
│   │   └── LocalCodeExecutor.ts # Local code executor
│   ├── providers/            # LLM provider implementations
│   │   ├── OpenAIProvider.ts
│   │   ├── OpenRouterProvider.ts
│   │   └── OllamaProvider.ts
│   ├── examples/             # Example applications
│   │   ├── basic-chat.ts
│   │   ├── function-calling-example.ts
│   │   └── code-execution-example.ts
│   └── index.ts              # Main export file
├── dist/                     # Compiled JavaScript output
├── package.json
├── tsconfig.json
└── README.md

Architecture

This implementation follows the AutoGen architecture with both traditional and event-driven patterns:

Event-Driven Architecture (AutoGen v0.4)

The new event-driven architecture enables scalable, distributed multi-agent systems:

  1. AgentRuntime: Core runtime for hosting and managing agents

    • sendMessage(): Direct asynchronous message passing
    • publishMessage(): Topic-based broadcast messaging
    • Agent registration and lifecycle management
    • State persistence and restoration
  2. AgentId & TopicId: Distributed agent addressing

    • Unique identification for agents across processes
    • Topic-based message routing
  3. CancellationToken: Async operation control

    • Cancel long-running operations
    • Cleanup on cancellation

See EVENT_DRIVEN.md for detailed documentation.

Traditional Architecture

  1. IAgent Interface: Defines the contract for all agents

    • generateReply(): Generate responses to messages
    • getName(): Get the agent's name
  2. BaseAgent: Abstract base class providing:

    • Conversation history management
    • Message sending and receiving
    • Chat initiation logic
    • Termination detection
  3. Agent Implementations:

    • AssistantAgent: Uses LLM providers for intelligent responses with function calling support
    • UserProxyAgent: Facilitates human interaction with configurable input modes and code execution
  4. Function Calling: Enable agents to call custom functions

    • Define functions with FunctionContract
    • Automatic function execution via FunctionCallMiddleware
    • OpenAI-compatible function definitions
  5. Code Execution: Execute code generated by agents

    • LocalCodeExecutor for JavaScript, Python, and Bash
    • Automatic code extraction from markdown code blocks
    • Safe execution in temporary directories

Message System

Messages follow a structured format:

interface IMessage {
  content: string;
  role: 'user' | 'assistant' | 'system' | 'function' | 'tool';
  name?: string;
  functionCall?: {
    name: string;
    arguments: string;
  };
  toolCalls?: Array<{
    id: string;
    type: 'function';
    function: {
      name: string;
      arguments: string;
    };
  }>;
  toolCallId?: string;
}

Configuration

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here  # Optional
GEMINI_API_KEY=your_gemini_api_key_here        # Optional
OPENROUTER_API_KEY=your_openrouter_api_key_here  # Optional
# OLLAMA_BASE_URL=http://localhost:11434/v1      # Optional

Scripts

# Build the project
npm run build

# Run the basic interactive example (OpenAI)
npm run example:basic

# Run the automated two-agent conversation (OpenAI)
npm run example:auto

# Run the group chat example (OpenAI)
npm run example:group

# Run Anthropic Claude example
npm run example:anthropic

# Run Google Gemini example
npm run example:gemini

# Run OpenRouter example
npm run example:openrouter

# Run Ollama example (local LLM)
npm run example:ollama

# Run Ollama file organizer example (automatic file renaming and tagging)
npm run example:ollama-organizer

# Run planner-supervisor example (task planning and verification with Ollama)
npm run example:planner-supervisor

# Run GitHub AI search example (web search and API tools with Ollama)
npm run example:github-ai-search

# Run function calling example
npm run example:functions

# Run code execution example
npm run example:code

# Run memory example
npm run example:memory

# Run nested chat example
npm run example:nested

# Run sequential chat example
npm run example:sequential

# Run speaker selection strategies example
npm run example:speaker

# Run swarm mode example
npm run example:swarm

# Run event-driven architecture example (AutoGen v0.4)
npm run example:events

# Run tools examples
npm run example:filesystem  # File system operations
npm run example:browser     # Web automation with Playwright
npm run example:api        # REST/GraphQL API calls
npm run example:docker     # Docker code execution
npm run example:image      # Image generation with DALL-E

# Run tests
npm test

# Run tests with coverage
npm run test:coverage

# Development mode with auto-reload
npm run dev

# Clean build artifacts
npm run clean

Tools & Extensions

autogen_node provides a comprehensive set of tools and extensions to enhance agent capabilities:

File System Tools

Safe file and directory operations with security restrictions:

import { FileSystemTool, AssistantAgent } from 'autogen_node';

const fsTool = new FileSystemTool({
  basePath: '/safe/directory',
  allowedExtensions: ['.txt', '.md', '.json']
});

// Create function contracts for agents
const functions = FileSystemTool.createFunctionContracts(fsTool);

const assistant = new AssistantAgent({
  name: 'file_assistant',
  functions,
  // ... other config
});

Available Operations:

  • read_file - Read file contents
  • write_file - Write content to a file
  • list_directory - List directory contents
  • create_directory - Create new directories
  • delete_file - Delete files
  • file_exists - Check if file/directory exists
  • rename_file - Rename or move files to new locations

Example Use Case: See ollama-file-organizer-example.ts for an intelligent file organization system that uses LLM to analyze file content, suggest categories, and automatically organize files into appropriate folders with descriptive names.

Browser Tools

Web automation using Playwright:

import { BrowserTool } from 'autogen_node';

const browser = new BrowserTool({ headless: true });

await browser.navigate('https://example.com');
const text = await browser.getText('h1');
await browser.screenshot({ path: 'screenshot.png' });

// Use with agents
const functions = BrowserTool.createFunctionContracts(browser);

Docker Code Executor

Safe code execution in isolated containers:

import { DockerCodeExecutor } from 'autogen_node';

const executor = new DockerCodeExecutor({ timeout: 30000 });

const result = await executor.executeCode(
  'console.log("Hello from Docker!");',
  'javascript'
);

API Tools

REST and GraphQL API wrapper:

import { APITool } from 'autogen_node';

const apiTool = new APITool({
  baseURL: 'https://api.example.com'
});

const data = await apiTool.get('/users');
const result = await apiTool.graphql(query);

Image Generation

AI image generation with DALL-E:

import { ImageGenerationTool } from 'autogen_node';

const imageTool = new ImageGenerationTool({
  openaiApiKey: process.env.OPENAI_API_KEY
});

const imageUrls = await imageTool.generateImage(
  'A serene landscape with mountains',
  { size: '1024x1024', quality: 'hd' }
);

Tool Caching

Cache expensive tool operations:

import { ToolCache, CacheStrategy } from 'autogen_node';

const cache = new ToolCache({
  maxSize: 100,
  defaultTTL: 5 * 60 * 1000,
  strategy: CacheStrategy.LRU
});

const cachedFn = cache.wrap('expensiveOp', asyncFunction);

For detailed documentation on all tools, see TOOLS.md.

Examples

Event-Driven Architecture (AutoGen v0.4)

import {
  AgentId,
  TopicId,
  SingleThreadedAgentRuntime,
  createSubscription,
} from './src/index';

// Create event-driven agent
class EventAgent {
  async handleMessage(message: any, sender: AgentId | null) {
    return {
      role: 'assistant',
      content: `Processed: ${message.content}`,
    };
  }
}

// Create runtime and register agents
const runtime = new SingleThreadedAgentRuntime();
const agent = new EventAgent();
const agentId = new AgentId('event_agent', 'agent1');

await runtime.registerAgentInstance(agent, agentId);

// Direct message passing
const response = await runtime.sendMessage(
  { content: 'Hello!' },
  agentId
);

// Topic-based pub/sub
const topic = new TopicId('notifications', 'system');
await runtime.addSubscription(
  createSubscription('sub1', topic, agentId)
);
await runtime.publishMessage(
  { content: 'Broadcast message' },
  topic
);

See EVENT_DRIVEN.md for complete documentation.

Basic Two-Agent Chat

import { AssistantAgent, UserProxyAgent, HumanInputMode } from './src/index';

const assistant = new AssistantAgent({
  name: 'assistant',
  apiKey: process.env.OPENAI_API_KEY!,
  systemMessage: 'You are a helpful math tutor.',
  model: 'gpt-3.5-turbo'
});

const user = new UserProxyAgent({
  name: 'user',
  humanInputMode: HumanInputMode.ALWAYS
});

await user.initiateChat(assistant, 'Help me solve 2x + 3 = 7', 10);

Automated Conversation (No Human Input)

const user = new UserProxyAgent({
  name: 'user',
  humanInputMode: HumanInputMode.NEVER
});

// Agent will auto-reply without human intervention

Function Calling

import { AssistantAgent, FunctionContract } from './src/index';

// Define a weather function
const getWeather = FunctionContract.fromFunction(
  'get_weather',
  'Get the current weather for a location',
  [
    {
      name: 'location',
      type: 'string',
      description: 'The city and state, e.g. San Francisco, CA',
      required: true
    }
  ],
  async (location: string) => {
    // Your weather API logic here
    return `The weather in ${location} is sunny, 72°F`;
  }
);

// Create assistant with functions
const assistant = new AssistantAgent({
  name: 'assistant',
  apiKey: process.env.OPENAI_API_KEY!,
  systemMessage: 'You are a helpful assistant with access to weather data.',
  model: 'gpt-3.5-turbo',
  functions: [getWeather]
});

// The assistant will automatically call the function when needed
await userProxy.initiateChat(assistant, "What's the weather in San Francisco?", 3);

Code Execution

import { AssistantAgent, UserProxyAgent, LocalCodeExecutor, HumanInputMode } from './src/index';

// Create code executor
const codeExecutor = new LocalCodeExecutor();

// Create assistant that writes code
const assistant = new AssistantAgent({
  name: 'assistant',
  apiKey: process.env.OPENAI_API_KEY!,
  systemMessage: 'You are a coding assistant. Write code in markdown code blocks.',
  model: 'gpt-3.5-turbo'
});

// Create user proxy with code execution enabled
const userProxy = new UserProxyAgent({
  name: 'user_proxy',
  humanInputMode: HumanInputMode.NEVER,
  codeExecutor: codeExecutor,
  autoExecuteCode: true
});

// The agent will write code, and it will be automatically executed
await userProxy.initiateChat(
  assistant,
  'Write JavaScript code to calculate the sum of numbers from 1 to 100',
  3
);

await codeExecutor.cleanup();

Group Chat with Multiple Agents

import { AssistantAgent, GroupChat, GroupChatManager } from './src/index';

// Create multiple specialized agents
const designer = new AssistantAgent({
  name: 'designer',
  apiKey: process.env.OPENAI_API_KEY!,
  systemMessage: 'You are a creative designer.',
  model: 'gpt-3.5-turbo'
});

const engineer = new AssistantAgent({
  name: 'engineer',
  apiKey: process.env.OPENAI_API_KEY!,
  systemMessage: 'You are a practical engineer.',
  model: 'gpt-3.5-turbo'
});

// Create group chat
const groupChat = new GroupChat({
  agents: [designer, engineer],
  maxRound: 10
});

// Create manager
const manager = new GroupChatManager({
  groupChat: groupChat
});

// Run the discussion
await manager.runChat('Design a new mobile app feature');

Advanced Conversation Patterns

autogen_node implements all major conversation patterns from Microsoft AutoGen:

Nested Chat

Delegate tasks to specialist agents:

import { AssistantAgent, supportsNestedChat } from './src/index';

const projectManager = new AssistantAgent({
  name: 'project_manager',
  systemMessage: 'You delegate tasks to specialists.',
  apiKey: process.env.OPENAI_API_KEY!
});

const specialist = new AssistantAgent({
  name: 'specialist',
  systemMessage: 'You are a code review specialist.',
  apiKey: process.env.OPENAI_API_KEY!
});

// Delegate task to specialist
const result = await projectManager.initiateNestedChat(
  'Review this code: ...',
  specialist,
  { maxRounds: 3, addToParentHistory: true }
);

Sequential Chat

Execute agents in predefined workflow order:

import { runSequentialChat, AssistantAgent } from './src/index';

const result = await runSequentialChat({
  steps: [
    { agent: researcher, maxRounds: 1 },
    { agent: writer, maxRounds: 1 },
    { agent: editor, maxRounds: 1 }
  ],
  initialMessage: 'Write an article about AI'
});

Speaker Selection Strategies

Control who speaks next in group chats:

import { GroupChat, RoundRobinSelector, RandomSelector, AutoSelector } from './src/index';

// Round-robin selection
const chat1 = new GroupChat({
  agents: [agent1, agent2, agent3],
  speakerSelector: new RoundRobinSelector()
});

// Random selection
const chat2 = new GroupChat({
  agents: [agent1, agent2, agent3],
  speakerSelector: new RandomSelector()
});

// LLM-based intelligent selection
const coordinator = new AssistantAgent({ ... });
const chat3 = new GroupChat({
  agents: [agent1, agent2, agent3],
  speakerSelector: new AutoSelector({ selectorAgent: coordinator })
});

Swarm Mode

Distribute tasks among agents dynamically:

import { SwarmChat, RoundRobinSelector } from './src/index';

const swarm = new SwarmChat({
  agents: [researcher, writer, coder, reviewer],
  maxRoundsPerTask: 3,
  taskAssignmentSelector: new RoundRobinSelector()
});

const result = await swarm.run([
  'Research TypeScript benefits',
  'Write a tutorial',
  'Create code examples',
  'Review documentation'
]);

console.log(`Completed: ${result.completedTasks.length}`);

See CONVERSATION_PATTERNS.md for detailed documentation.

Memory Usage

Memory allows agents to maintain context across conversations:

import { AssistantAgent, ListMemory, MemoryMimeType } from './src/index';

// Create memory instance
const userMemory = new ListMemory({ name: 'user_preferences' });

// Add memory content
await userMemory.add({
  content: 'User prefers formal language',
  mimeType: MemoryMimeType.TEXT,
  metadata: { timestamp: Date.now() }
});

await userMemory.add({
  content: 'User is interested in TypeScript and AI',
  mimeType: MemoryMimeType.TEXT
});

// Create agent with memory
const assistant = new AssistantAgent({
  name: 'assistant',
  provider: 'openai',
  apiKey: process.env.OPENAI_API_KEY!,
  memory: [userMemory]
});

// Memory is automatically injected into context
const reply = await assistant.generateReply([
  { role: 'user', content: 'What should I learn next?' }
]);

For more details, see MEMORY.md.

Comparison with .NET AutoGen

Feature .NET AutoGen autogen_node
Base Agent Framework
AssistantAgent
UserProxyAgent
ConversableAgent
RetrieveUserProxyAgent (RAG)
GPTAssistantAgent
MultimodalConversableAgent
TeachableAgent
CompressibleAgent
SocietyOfMindAgent
OpenAI Integration
Group Chat
Multiple LLM Providers ✅ (OpenAI, Anthropic, Gemini, OpenRouter, Ollama)
Function Calling
Code Execution ✅ (JavaScript, Python, Bash)
Memory System ✅ (Based on Python AutoGen)
Event-Driven Architecture (v0.4)
AgentRuntime ✅ (SingleThreadedAgentRuntime)
Async Message Passing
Publish/Subscribe

Advanced Agent Types

autogen_node now includes all major agent types from Microsoft AutoGen:

  • ConversableAgent: Flexible agent with optional LLM integration and configurable behaviors
  • RetrieveUserProxyAgent: RAG-enabled agent for document Q&A and knowledge base queries
  • GPTAssistantAgent: Integration with OpenAI's Assistant API for persistent conversations
  • MultimodalConversableAgent: Support for images, audio, and multimodal interactions
  • TeachableAgent: Learns user preferences and provides personalized responses
  • CompressibleAgent: Manages long conversations with automatic history compression
  • SocietyOfMindAgent: Complex reasoning using multiple specialized inner agents
  • PlannerAgent: Breaks down complex requirements into structured, executable task plans
  • SupervisorAgent: Verifies task completion and ensures requirements are met with feedback loops

For detailed documentation and examples, see:

Roadmap

  • Base agent framework
  • AssistantAgent with OpenAI
  • UserProxyAgent
  • Group chat capabilities
  • Multiple LLM provider support (OpenAI, OpenRouter, Ollama)
  • Function calling support
  • Code execution agent (JavaScript, Python, Bash)
  • Additional LLM provider integrations (Anthropic SDK, Google Gemini)
  • Memory system (ListMemory implementation)
  • Event-driven architecture (AutoGen v0.4)
    • AgentRuntime interface
    • SingleThreadedAgentRuntime implementation
    • AgentId and TopicId for addressing
    • CancellationToken for async control
    • Direct message passing (sendMessage)
    • Publish/Subscribe messaging (publishMessage)
    • State persistence and management
    • Distributed runtime (multi-process/multi-machine)
  • Advanced agent types
    • ConversableAgent (flexible conversable agent)
    • RetrieveUserProxyAgent (RAG support)
    • GPTAssistantAgent (OpenAI Assistant API)
    • MultimodalConversableAgent (image/audio support)
    • TeachableAgent (learning and personalization)
    • CompressibleAgent (conversation compression)
    • SocietyOfMindAgent (multi-agent reasoning)
    • PlannerAgent (task planning and decomposition)
    • SupervisorAgent (task verification and feedback)
  • Advanced conversation patterns
    • Nested Chat (task delegation)
    • Sequential Chat (workflow automation)
    • Speaker Selection Strategies (Round-robin, Random, Manual, Constrained, Auto)
    • Swarm Mode (dynamic multi-agent collaboration)
  • Tools & Extensions
    • File System Tools (read/write/directory operations)
    • Browser Tools (Playwright web automation)
    • Docker Code Executor (isolated code execution)
    • API Tools (REST/GraphQL wrappers)
    • Image Generation Tools (DALL-E integration)
    • Database Tools (SQL/NoSQL interfaces)
    • Tool Caching (result caching with eviction strategies)
  • MCP (Model Context Protocol) Server Support
  • Streaming responses
  • Performance optimizations
  • Additional memory backends (Vector, Database, File-based)

Contributing

Contributions are welcome! This project aims to maintain feature parity with the .NET version of AutoGen while adapting to Node.js/TypeScript best practices.

License

MIT

Acknowledgments

This project is inspired by and based on the architecture of microsoft/autogen. Special thanks to the AutoGen team for creating such a powerful framework.

Related Projects

About

Node.js version of microsoft/autogen

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors