autogen_node

A Node.js/TypeScript implementation of microsoft/autogen, providing a framework for building multi-agent AI systems with conversational agents.

Overview

This project brings the powerful multi-agent orchestration capabilities of Microsoft's AutoGen framework to the Node.js ecosystem. It's designed based on the .NET code structure and class definitions, providing a familiar API for developers working with AutoGen in different languages.

Features

Event-Driven Architecture (AutoGen v0.4): Asynchronous message passing and distributed agent systems
- AgentRuntime: Core runtime for hosting and managing agents
- Direct Messaging: Send messages between agents asynchronously
- Publish/Subscribe: Topic-based broadcast messaging
- Cancellation Tokens: Control and cancel async operations
- State Management: Persist and restore runtime state
Base Agent Framework: Core interfaces and abstract classes for building custom agents
Multiple LLM Providers: Support for OpenAI, OpenRouter, Ollama, Anthropic, and Google Gemini
- OpenAI: GPT-3.5, GPT-4, and other OpenAI models
- Anthropic: Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku
- Google Gemini: Gemini 1.5 Flash, Gemini 1.5 Pro, and Gemini Pro
- OpenRouter: Access to 100+ models from multiple providers
- Ollama: Run LLMs locally for privacy and offline use
AssistantAgent: LLM-powered conversational agent with provider flexibility
UserProxyAgent: Human-in-the-loop agent for interactive conversations
PlannerAgent: Planning agent that breaks down requirements into structured tasks
SupervisorAgent: Supervisor agent that verifies task completion and provides feedback
Group Chat: Multi-agent collaboration system for complex tasks
Function Calling: Register and execute custom functions with agents
Code Execution: Automatically execute code generated by agents (JavaScript, Python, Bash)
- LocalCodeExecutor: Execute code on the local machine
- DockerCodeExecutor: Execute code safely in isolated Docker containers
Memory System: Persistent memory for agents to maintain context across conversations (based on Microsoft AutoGen)
Type-Safe: Built with TypeScript for enhanced developer experience
Flexible Message System: Support for different message types and roles
Conversation Management: Built-in conversation history and state management
Advanced Conversation Patterns: Complete implementation of AutoGen patterns
- Nested Chat: Hierarchical conversations with task delegation
- Sequential Chat: Predefined workflow execution
- Speaker Selection: Multiple strategies (Round-robin, Random, Manual, Constrained, Auto/LLM-based)
- Swarm Mode: Dynamic multi-agent task distribution and collaboration
Tools & Extensions: Comprehensive toolset for agent capabilities
- File System Tools: Safe file read/write and directory operations
- Browser Tools: Web automation with Playwright (scraping, screenshots, interaction)
- API Tools: REST and GraphQL API call wrappers
- Image Generation: DALL-E integration for AI image generation
- Database Tools: SQL/NoSQL database connection interfaces
- Tool Caching: Result caching with multiple eviction strategies

Installation

npm install

Quick Start

Using OpenAI (Default)

import { AssistantAgent, UserProxyAgent, HumanInputMode } from './src/index';

// Create an AI assistant
const assistant = new AssistantAgent({
  name: 'assistant',
  provider: 'openai',  // optional, this is the default
  apiKey: process.env.OPENAI_API_KEY!,
  systemMessage: 'You are a helpful assistant.',
  model: 'gpt-3.5-turbo',
  temperature: 0
});

// Create a user proxy for human interaction
const userProxy = new UserProxyAgent({
  name: 'user',
  humanInputMode: HumanInputMode.ALWAYS
});

// Start a conversation
await userProxy.initiateChat(
  assistant,
  'Hello! Can you help me?',
  10 // max rounds
);

Using OpenRouter

const assistant = new AssistantAgent({
  name: 'assistant',
  provider: 'openrouter',
  apiKey: process.env.OPENROUTER_API_KEY!,
  model: 'anthropic/claude-2',
  temperature: 0.7
});

Using Anthropic Claude

const assistant = new AssistantAgent({
  name: 'assistant',
  provider: 'anthropic',
  apiKey: process.env.ANTHROPIC_API_KEY!,
  model: 'claude-3-5-sonnet-20241022',
  temperature: 0.7
});

Using Google Gemini

const assistant = new AssistantAgent({
  name: 'assistant',
  provider: 'gemini',
  apiKey: process.env.GEMINI_API_KEY!,
  model: 'gemini-1.5-flash',
  temperature: 0.7
});

Using Ollama (Local)

const assistant = new AssistantAgent({
  name: 'assistant',
  provider: 'ollama',
  model: 'llama2',
  temperature: 0.7
});

See LLM_PROVIDERS.md for detailed provider documentation.

Project Structure

autogen_node/
├── src/
│   ├── core/                 # Core interfaces and base classes
│   │   ├── IAgent.ts         # Agent interface definitions
│   │   ├── BaseAgent.ts      # Base agent implementation
│   │   ├── IFunctionCall.ts  # Function calling interfaces
│   │   ├── FunctionContract.ts # Function contract builder
│   │   ├── FunctionCallMiddleware.ts # Function execution middleware
│   │   └── ICodeExecutor.ts  # Code execution interface
│   ├── agents/               # Agent implementations
│   │   ├── AssistantAgent.ts # LLM-powered assistant with function calling
│   │   └── UserProxyAgent.ts # Human proxy with code execution
│   ├── executors/            # Code execution implementations
│   │   └── LocalCodeExecutor.ts # Local code executor
│   ├── providers/            # LLM provider implementations
│   │   ├── OpenAIProvider.ts
│   │   ├── OpenRouterProvider.ts
│   │   └── OllamaProvider.ts
│   ├── examples/             # Example applications
│   │   ├── basic-chat.ts
│   │   ├── function-calling-example.ts
│   │   └── code-execution-example.ts
│   └── index.ts              # Main export file
├── dist/                     # Compiled JavaScript output
├── package.json
├── tsconfig.json
└── README.md

Architecture

This implementation follows the AutoGen architecture with both traditional and event-driven patterns:

Event-Driven Architecture (AutoGen v0.4)

The new event-driven architecture enables scalable, distributed multi-agent systems:

AgentRuntime: Core runtime for hosting and managing agents
- sendMessage(): Direct asynchronous message passing
- publishMessage(): Topic-based broadcast messaging
- Agent registration and lifecycle management
- State persistence and restoration
AgentId & TopicId: Distributed agent addressing
- Unique identification for agents across processes
- Topic-based message routing
CancellationToken: Async operation control
- Cancel long-running operations
- Cleanup on cancellation

See EVENT_DRIVEN.md for detailed documentation.

Traditional Architecture

IAgent Interface: Defines the contract for all agents
- generateReply(): Generate responses to messages
- getName(): Get the agent's name
BaseAgent: Abstract base class providing:
- Conversation history management
- Message sending and receiving
- Chat initiation logic
- Termination detection
Agent Implementations:
- AssistantAgent: Uses LLM providers for intelligent responses with function calling support
- UserProxyAgent: Facilitates human interaction with configurable input modes and code execution
Function Calling: Enable agents to call custom functions
- Define functions with FunctionContract
- Automatic function execution via FunctionCallMiddleware
- OpenAI-compatible function definitions
Code Execution: Execute code generated by agents
- LocalCodeExecutor for JavaScript, Python, and Bash
- Automatic code extraction from markdown code blocks
- Safe execution in temporary directories

Message System

Messages follow a structured format:

interface IMessage {
  content: string;
  role: 'user' | 'assistant' | 'system' | 'function' | 'tool';
  name?: string;
  functionCall?: {
    name: string;
    arguments: string;
  };
  toolCalls?: Array<{
    id: string;
    type: 'function';
    function: {
      name: string;
      arguments: string;
    };
  }>;
  toolCallId?: string;
}

Configuration

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here  # Optional
GEMINI_API_KEY=your_gemini_api_key_here        # Optional
OPENROUTER_API_KEY=your_openrouter_api_key_here  # Optional
# OLLAMA_BASE_URL=http://localhost:11434/v1      # Optional

Scripts

# Build the project
npm run build

# Run the basic interactive example (OpenAI)
npm run example:basic

# Run the automated two-agent conversation (OpenAI)
npm run example:auto

# Run the group chat example (OpenAI)
npm run example:group

# Run Anthropic Claude example
npm run example:anthropic

# Run Google Gemini example
npm run example:gemini

# Run OpenRouter example
npm run example:openrouter

# Run Ollama example (local LLM)
npm run example:ollama

# Run Ollama file organizer example (automatic file renaming and tagging)
npm run example:ollama-organizer

# Run planner-supervisor example (task planning and verification with Ollama)
npm run example:planner-supervisor

# Run GitHub AI search example (web search and API tools with Ollama)
npm run example:github-ai-search

# Run function calling example
npm run example:functions

# Run code execution example
npm run example:code

# Run memory example
npm run example:memory

# Run nested chat example
npm run example:nested

# Run sequential chat example
npm run example:sequential

# Run speaker selection strategies example
npm run example:speaker

# Run swarm mode example
npm run example:swarm

# Run event-driven architecture example (AutoGen v0.4)
npm run example:events

# Run tools examples
npm run example:filesystem  # File system operations
npm run example:browser     # Web automation with Playwright
npm run example:api        # REST/GraphQL API calls
npm run example:docker     # Docker code execution
npm run example:image      # Image generation with DALL-E

# Run tests
npm test

# Run tests with coverage
npm run test:coverage

# Development mode with auto-reload
npm run dev

# Clean build artifacts
npm run clean

Tools & Extensions

autogen_node provides a comprehensive set of tools and extensions to enhance agent capabilities:

File System Tools

Safe file and directory operations with security restrictions:

import { FileSystemTool, AssistantAgent } from 'autogen_node';

const fsTool = new FileSystemTool({
  basePath: '/safe/directory',
  allowedExtensions: ['.txt', '.md', '.json']
});

// Create function contracts for agents
const functions = FileSystemTool.createFunctionContracts(fsTool);

const assistant = new AssistantAgent({
  name: 'file_assistant',
  functions,
  // ... other config
});

Available Operations:

read_file - Read file contents
write_file - Write content to a file
list_directory - List directory contents
create_directory - Create new directories
delete_file - Delete files
file_exists - Check if file/directory exists
rename_file - Rename or move files to new locations

Example Use Case: See ollama-file-organizer-example.ts for an intelligent file organization system that uses LLM to analyze file content, suggest categories, and automatically organize files into appropriate folders with descriptive names.

Browser Tools

Web automation using Playwright:

import { BrowserTool } from 'autogen_node';

const browser = new BrowserTool({ headless: true });

await browser.navigate('https://example.com');
const text = await browser.getText('h1');
await browser.screenshot({ path: 'screenshot.png' });

// Use with agents
const functions = BrowserTool.createFunctionContracts(browser);

Docker Code Executor

Safe code execution in isolated containers:

import { DockerCodeExecutor } from 'autogen_node';

const executor = new DockerCodeExecutor({ timeout: 30000 });

const result = await executor.executeCode(
  'console.log("Hello from Docker!");',
  'javascript'
);

API Tools

REST and GraphQL API wrapper:

import { APITool } from 'autogen_node';

const apiTool = new APITool({
  baseURL: 'https://api.example.com'
});

const data = await apiTool.get('/users');
const result = await apiTool.graphql(query);

Image Generation

AI image generation with DALL-E:

import { ImageGenerationTool } from 'autogen_node';

const imageTool = new ImageGenerationTool({
  openaiApiKey: process.env.OPENAI_API_KEY
});

const imageUrls = await imageTool.generateImage(
  'A serene landscape with mountains',
  { size: '1024x1024', quality: 'hd' }
);

Tool Caching

Cache expensive tool operations:

import { ToolCache, CacheStrategy } from 'autogen_node';

const cache = new ToolCache({
  maxSize: 100,
  defaultTTL: 5 * 60 * 1000,
  strategy: CacheStrategy.LRU
});

const cachedFn = cache.wrap('expensiveOp', asyncFunction);

For detailed documentation on all tools, see TOOLS.md.

Examples

Event-Driven Architecture (AutoGen v0.4)

import {
  AgentId,
  TopicId,
  SingleThreadedAgentRuntime,
  createSubscription,
} from './src/index';

// Create event-driven agent
class EventAgent {
  async handleMessage(message: any, sender: AgentId | null) {
    return {
      role: 'assistant',
      content: `Processed: ${message.content}`,
    };
  }
}

// Create runtime and register agents
const runtime = new SingleThreadedAgentRuntime();
const agent = new EventAgent();
const agentId = new AgentId('event_agent', 'agent1');

await runtime.registerAgentInstance(agent, agentId);

// Direct message passing
const response = await runtime.sendMessage(
  { content: 'Hello!' },
  agentId
);

// Topic-based pub/sub
const topic = new TopicId('notifications', 'system');
await runtime.addSubscription(
  createSubscription('sub1', topic, agentId)
);
await runtime.publishMessage(
  { content: 'Broadcast message' },
  topic
);

See EVENT_DRIVEN.md for complete documentation.

Basic Two-Agent Chat

import { AssistantAgent, UserProxyAgent, HumanInputMode } from './src/index';

const assistant = new AssistantAgent({
  name: 'assistant',
  apiKey: process.env.OPENAI_API_KEY!,
  systemMessage: 'You are a helpful math tutor.',
  model: 'gpt-3.5-turbo'
});

const user = new UserProxyAgent({
  name: 'user',
  humanInputMode: HumanInputMode.ALWAYS
});

await user.initiateChat(assistant, 'Help me solve 2x + 3 = 7', 10);

Automated Conversation (No Human Input)

const user = new UserProxyAgent({
  name: 'user',
  humanInputMode: HumanInputMode.NEVER
});

// Agent will auto-reply without human intervention

Function Calling

import { AssistantAgent, FunctionContract } from './src/index';

// Define a weather function
const getWeather = FunctionContract.fromFunction(
  'get_weather',
  'Get the current weather for a location',
  [
    {
      name: 'location',
      type: 'string',
      description: 'The city and state, e.g. San Francisco, CA',
      required: true
    }
  ],
  async (location: string) => {
    // Your weather API logic here
    return `The weather in ${location} is sunny, 72°F`;
  }
);

// Create assistant with functions
const assistant = new AssistantAgent({
  name: 'assistant',
  apiKey: process.env.OPENAI_API_KEY!,
  systemMessage: 'You are a helpful assistant with access to weather data.',
  model: 'gpt-3.5-turbo',
  functions: [getWeather]
});

// The assistant will automatically call the function when needed
await userProxy.initiateChat(assistant, "What's the weather in San Francisco?", 3);

Code Execution

import { AssistantAgent, UserProxyAgent, LocalCodeExecutor, HumanInputMode } from './src/index';

// Create code executor
const codeExecutor = new LocalCodeExecutor();

// Create assistant that writes code
const assistant = new AssistantAgent({
  name: 'assistant',
  apiKey: process.env.OPENAI_API_KEY!,
  systemMessage: 'You are a coding assistant. Write code in markdown code blocks.',
  model: 'gpt-3.5-turbo'
});

// Create user proxy with code execution enabled
const userProxy = new UserProxyAgent({
  name: 'user_proxy',
  humanInputMode: HumanInputMode.NEVER,
  codeExecutor: codeExecutor,
  autoExecuteCode: true
});

// The agent will write code, and it will be automatically executed
await userProxy.initiateChat(
  assistant,
  'Write JavaScript code to calculate the sum of numbers from 1 to 100',
  3
);

await codeExecutor.cleanup();

Group Chat with Multiple Agents

import { AssistantAgent, GroupChat, GroupChatManager } from './src/index';

// Create multiple specialized agents
const designer = new AssistantAgent({
  name: 'designer',
  apiKey: process.env.OPENAI_API_KEY!,
  systemMessage: 'You are a creative designer.',
  model: 'gpt-3.5-turbo'
});

const engineer = new AssistantAgent({
  name: 'engineer',
  apiKey: process.env.OPENAI_API_KEY!,
  systemMessage: 'You are a practical engineer.',
  model: 'gpt-3.5-turbo'
});

// Create group chat
const groupChat = new GroupChat({
  agents: [designer, engineer],
  maxRound: 10
});

// Create manager
const manager = new GroupChatManager({
  groupChat: groupChat
});

// Run the discussion
await manager.runChat('Design a new mobile app feature');

Advanced Conversation Patterns

autogen_node implements all major conversation patterns from Microsoft AutoGen:

Nested Chat

Delegate tasks to specialist agents:

import { AssistantAgent, supportsNestedChat } from './src/index';

const projectManager = new AssistantAgent({
  name: 'project_manager',
  systemMessage: 'You delegate tasks to specialists.',
  apiKey: process.env.OPENAI_API_KEY!
});

const specialist = new AssistantAgent({
  name: 'specialist',
  systemMessage: 'You are a code review specialist.',
  apiKey: process.env.OPENAI_API_KEY!
});

// Delegate task to specialist
const result = await projectManager.initiateNestedChat(
  'Review this code: ...',
  specialist,
  { maxRounds: 3, addToParentHistory: true }
);

Sequential Chat

Execute agents in predefined workflow order:

import { runSequentialChat, AssistantAgent } from './src/index';

const result = await runSequentialChat({
  steps: [
    { agent: researcher, maxRounds: 1 },
    { agent: writer, maxRounds: 1 },
    { agent: editor, maxRounds: 1 }
  ],
  initialMessage: 'Write an article about AI'
});

Speaker Selection Strategies

Control who speaks next in group chats:

import { GroupChat, RoundRobinSelector, RandomSelector, AutoSelector } from './src/index';

// Round-robin selection
const chat1 = new GroupChat({
  agents: [agent1, agent2, agent3],
  speakerSelector: new RoundRobinSelector()
});

// Random selection
const chat2 = new GroupChat({
  agents: [agent1, agent2, agent3],
  speakerSelector: new RandomSelector()
});

// LLM-based intelligent selection
const coordinator = new AssistantAgent({ ... });
const chat3 = new GroupChat({
  agents: [agent1, agent2, agent3],
  speakerSelector: new AutoSelector({ selectorAgent: coordinator })
});

Swarm Mode

Distribute tasks among agents dynamically:

import { SwarmChat, RoundRobinSelector } from './src/index';

const swarm = new SwarmChat({
  agents: [researcher, writer, coder, reviewer],
  maxRoundsPerTask: 3,
  taskAssignmentSelector: new RoundRobinSelector()
});

const result = await swarm.run([
  'Research TypeScript benefits',
  'Write a tutorial',
  'Create code examples',
  'Review documentation'
]);

console.log(`Completed: ${result.completedTasks.length}`);

See CONVERSATION_PATTERNS.md for detailed documentation.

Memory Usage

Memory allows agents to maintain context across conversations:

import { AssistantAgent, ListMemory, MemoryMimeType } from './src/index';

// Create memory instance
const userMemory = new ListMemory({ name: 'user_preferences' });

// Add memory content
await userMemory.add({
  content: 'User prefers formal language',
  mimeType: MemoryMimeType.TEXT,
  metadata: { timestamp: Date.now() }
});

await userMemory.add({
  content: 'User is interested in TypeScript and AI',
  mimeType: MemoryMimeType.TEXT
});

// Create agent with memory
const assistant = new AssistantAgent({
  name: 'assistant',
  provider: 'openai',
  apiKey: process.env.OPENAI_API_KEY!,
  memory: [userMemory]
});

// Memory is automatically injected into context
const reply = await assistant.generateReply([
  { role: 'user', content: 'What should I learn next?' }
]);

For more details, see MEMORY.md.

Comparison with .NET AutoGen

Feature	.NET AutoGen	autogen_node
Base Agent Framework	✅	✅
AssistantAgent	✅	✅
UserProxyAgent	✅	✅
ConversableAgent	✅	✅
RetrieveUserProxyAgent (RAG)	✅	✅
GPTAssistantAgent	✅	✅
MultimodalConversableAgent	✅	✅
TeachableAgent	✅	✅
CompressibleAgent	✅	✅
SocietyOfMindAgent	✅	✅
OpenAI Integration	✅	✅
Group Chat	✅	✅
Multiple LLM Providers	✅	✅ (OpenAI, Anthropic, Gemini, OpenRouter, Ollama)
Function Calling	✅	✅
Code Execution	✅	✅ (JavaScript, Python, Bash)
Memory System	✅	✅ (Based on Python AutoGen)
Event-Driven Architecture (v0.4)	✅	✅
AgentRuntime	✅	✅ (SingleThreadedAgentRuntime)
Async Message Passing	✅	✅
Publish/Subscribe	✅	✅

Advanced Agent Types

autogen_node now includes all major agent types from Microsoft AutoGen:

ConversableAgent: Flexible agent with optional LLM integration and configurable behaviors
RetrieveUserProxyAgent: RAG-enabled agent for document Q&A and knowledge base queries
GPTAssistantAgent: Integration with OpenAI's Assistant API for persistent conversations
MultimodalConversableAgent: Support for images, audio, and multimodal interactions
TeachableAgent: Learns user preferences and provides personalized responses
CompressibleAgent: Manages long conversations with automatic history compression
SocietyOfMindAgent: Complex reasoning using multiple specialized inner agents
PlannerAgent: Breaks down complex requirements into structured, executable task plans
SupervisorAgent: Verifies task completion and ensures requirements are met with feedback loops

For detailed documentation and examples, see:

ADVANCED_AGENTS.md - ConversableAgent, RAG, GPT Assistant, Multimodal, etc.
PLANNER_SUPERVISOR.md - Planning and Supervision workflow (English)
PLANNER_SUPERVISOR_CN.md - Planning and Supervision workflow (中文)

Roadmap

Contributing

Contributions are welcome! This project aims to maintain feature parity with the .NET version of AutoGen while adapting to Node.js/TypeScript best practices.

License

MIT

Acknowledgments

This project is inspired by and based on the architecture of microsoft/autogen. Special thanks to the AutoGen team for creating such a powerful framework.

Related Projects

microsoft/autogen - Original Python implementation
microsoft/autogen (dotnet) - .NET implementation

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.devcontainer		.devcontainer
docs		docs
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
jest.config.js		jest.config.js
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

autogen_node

Overview

Features

Installation

Quick Start

Using OpenAI (Default)

Using OpenRouter

Using Anthropic Claude

Using Google Gemini

Using Ollama (Local)

Project Structure

Architecture

Event-Driven Architecture (AutoGen v0.4)

Traditional Architecture

Message System

Configuration

Scripts

Tools & Extensions

File System Tools

Browser Tools

Docker Code Executor

API Tools

Image Generation

Tool Caching

Examples

Event-Driven Architecture (AutoGen v0.4)

Basic Two-Agent Chat

Automated Conversation (No Human Input)

Function Calling

Code Execution

Group Chat with Multiple Agents

Advanced Conversation Patterns

Nested Chat

Sequential Chat

Speaker Selection Strategies

Swarm Mode

Memory Usage

Comparison with .NET AutoGen

Advanced Agent Types

Roadmap

Contributing

License

Acknowledgments

Related Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages