Skip to content
Merged
40 changes: 36 additions & 4 deletions docs/docs/AIAssistant.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,9 @@ Gemini is supported out of the box.
Name: Gemini
URL: https://generativelanguage.googleapis.com
API Key secret: (AI Studio API key)
Models (add one or more):
- gemini-1.5-pro (Max Tokens: 1000000)
- gemini-1.5-flash (Max Tokens: 1000000)
- gemini-1.5-flash-8b (Max Tokens: 1000000)
Models (add one or more — use Browse models for the exact current IDs):
- gemini-3-pro (Max Tokens: 1000000)
- gemini-3-flash (Max Tokens: 1000000)
```

Notes:
Expand All @@ -120,6 +119,7 @@ In the **AI Assistant Settings** modal (opened via the **Configure AI Assistant*
- **Prompt Template Folder Path**: Path to your folder with prompt templates.
- **Show Assistant**: Show status messages from the AI Assistant.
- **Default System Prompt**: The default system prompt for the AI Assistant. Sets the behavior of the model.
- **Confirm AI tool calls**: When an AI agent runs a tool (see *Tool / function calling* below), whether to ask first. *Destructive tools only* (default) confirms any tool not marked read-only; *Always* confirms every tool; *Never* defers to each tool's own setting. A tool that requires approval is always confirmed regardless.

For each individual AI Assistant command in your macros, you can set these options:

Expand All @@ -135,6 +135,38 @@ You can also tweak model parameters in advanced settings:
- **frequency_penalty:** A parameter ranging between -2.0 and 2.0. Positive values penalize new tokens based on their frequency in the existing text, reducing the model's tendency to repeat the same lines. (Not applicable to Gemini.)
- **presence_penalty:** Also ranging between -2.0 and 2.0, positive values penalize new tokens based on their presence in the existing text, encouraging the model to introduce new topics. (Not applicable to Gemini.)

## Tool / function calling (scripts)

Beyond one-shot prompts, the AI Assistant can act as a small **agent**: you give the model a
prompt plus a set of *tools* (JavaScript functions), and it decides which to call, in a bounded
multi-step loop, until it has an answer. This is available from the [script API](./QuickAddAPI.md)
only — tools are JS functions, so they live in a User Script (a macro), not in a stored choice.

```js
module.exports = async ({ quickAddApi, app }) => {
const agent = quickAddApi.ai.agent({
model: "gpt-5",
system: "You are a vault librarian. Ground every claim in the user's notes.",
tools: { ...quickAddApi.ai.tools.vault({ only: ["read_note", "search_notes"] }) },
});
const { text } = await agent.generate({ prompt: "What do my notes say about gardening?" });
return text;
};
```

QuickAdd ships **built-in tools** you can opt into (`quickAddApi.ai.tools.vault/workspace/system`),
and you can declare your own with `quickAddApi.ai.tool({ description, inputSchema, execute })`. See the
[API reference](./QuickAddAPI.md) for the full surface (agents, tools, structured output via a `schema`).

:::warning Tool calls run your code with model-chosen arguments
Tool handlers run with full vault and network access. The **model** decides which tool to call and
with what arguments — possibly influenced by note content it reads. Treat tool results and note
content as untrusted data, validate the arguments your handlers receive, never pass them to
`format()`/`eval`/a shell, and never put secrets in a tool's description or arguments. Destructive
tools ask for confirmation by default (the **Confirm AI tool calls** setting); read-only tools run
automatically.
:::

## AI-Powered Workflows

You can create powerful workflows utilizing the AI Assistant. Some examples are:
Expand Down
131 changes: 131 additions & 0 deletions docs/docs/QuickAddAPI.md
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,137 @@ const result = await quickAddApi.ai.chunkedPrompt(
);
```

### Tool / function calling — `ai.agent(config)`

Build an **agent**: give the model a prompt and a set of *tools* (JS functions), and it
will call them in a bounded multi-step loop until it has an answer. Works across your
configured OpenAI-compatible, Anthropic, and Gemini providers.

```js
const agent = quickAddApi.ai.agent({
model: "gpt-5",
system: "You manage an Obsidian vault. Use the tools to ground your answers.",
tools: {
// built-in tools, opt-in (see ai.tools.* below)
...quickAddApi.ai.tools.vault({ only: ["read_note", "search_notes"] }),
// your own tool
save_link: quickAddApi.ai.tool({
description: "Append a URL to the reading-list note.",
inputSchema: {
type: "object",
properties: { url: { type: "string" } },
required: ["url"],
},
needsApproval: true, // ask before running (the model chose the args)
execute: async ({ url }) => {
const file = app.vault.getAbstractFileByPath("Reading list.md");
await app.vault.append(file, `\n- ${url}`);
return { saved: true };
},
}),
},
maxSteps: 12, // optional; default 20, hard-capped at 100
});

const { text, steps, toolCalls } = await agent.generate({
prompt: "Summarise my notes about {{VALUE:topic}} and save any links you find.",
assignToVariable: "summary", // optional: writes {{VALUE:summary}} for a later step
});
```

**`ai.agent(config)`** returns an Agent. Config:
- `model` — a configured model name (string) or `{ name }`.
- `system` — system prompt (defaults to your AI Assistant default system prompt).
- `tools` — an object map of tool name → tool (from `ai.tool()` and/or `ai.tools.*`).
- `toolChoice` — `"auto"` (default) | `"none"` | `"required"` | `{ type: "tool", toolName }`.
- `stopWhen` — one or more stop conditions from `ai.stepCountIs(n)` / `ai.hasToolCall(name)`.
- `maxSteps` — step budget (default 20, hard cap 100). Sugar for `stopWhen: ai.stepCountIs(n)`.
- `maxOutputTokens`, `modelOptions` — passed to the provider.

**`agent.generate(options)`** runs the loop and resolves to a result:
- `text` — the final assistant text.
- `object` — present **only** when you pass a `schema` (structured output, below).
- `steps` — the full transcript: `{ text, toolCalls, toolResults, finishReason }[]`.
- `toolCalls` / `toolResults` — from the last step (`input` / `output` fields, AI-SDK style).
- `usage` — `{ inputTokens, outputTokens, totalTokens }`.
- `finishReason` — `"stop" | "max-steps" | "length" | "aborted" | "context-overflow"`.

Options: `prompt` (formatted, like `ai.prompt`), `schema`, `system`/`toolChoice`/`maxOutputTokens`
(per-call overrides), and `assignToVariable` (write `text` into `{{VALUE:name}}`).

The agent is a **stateless config holder** — each `generate()` is independent (no retained
conversation). Reuse means reusing the config; run one `generate()` at a time per agent.

### `ai.tool(def)`

Declares a tool. `def`: `{ description, inputSchema (JSON Schema), execute, needsApproval?, readOnly?, strict? }`.

- `inputSchema` is a **JSON-Schema subset** (`type`/`properties`/`required`/`enum`/`items`).
Unsupported keywords (`pattern`, `additionalProperties`, `$ref`, `format`, …) are rejected
at registration so a provider can't silently drop a constraint.
- `execute(input, ctx)` runs your code with the model-chosen `input` (validated against the
schema first). Return a string (used verbatim) or any JSON-serialisable value.
- `needsApproval` (boolean or `(args) => boolean`) asks before running. `readOnly: true` marks a
tool that only reads, so it auto-runs under the default confirmation setting.

:::note Confirmation needs an interactive Obsidian session
A tool that asks for approval opens a modal and waits for it. For unattended automation (e.g.
driving QuickAdd from the CLI), give the agent only `readOnly` tools, or set **Confirm AI tool
calls** to *Never* and gate each tool with its own `needsApproval` — otherwise the run blocks on a
dialog no one can answer.
:::

> ⚠️ **Security.** Tool handlers run with the same full privilege as your script (Node `require`,
> `app`, the vault). The **model decides which tool to call and with what arguments**, possibly
> influenced by note content it reads (indirect prompt injection). QuickAdd never runs model-chosen
> arguments through the formatter — and **neither should you**: never pass a tool's `input` to
> `quickAddApi.format()`, `eval`, a shell, or `fetch` without validating it. Never put secrets in a
> tool's description or arguments (they are sent to the provider). Confirmation is governed by each
> tool's `needsApproval` plus the global **Confirm AI tool calls** setting (default *destructive only*).

### Built-in tools — `ai.tools.{vault, workspace, system}(options)`

Opt-in groups of ready-made tools. Each returns a tool map you spread into an agent's `tools`.
Options: `{ only, exclude, prefix, allowedRoots }` (`allowedRoots` confines vault paths to folders).

| Group | Read-only (auto-run) | Write (asks for approval) |
|---|---|---|
| `vault` | `read_note`, `list_notes`, `search_notes`, `get_property_values` | `create_note`, `append_to_note`, `insert_under_heading` |
| `workspace` | `get_active_note`, `get_selection` | — |
| `system` | `get_date` | — |

Write tools sanitise every model-chosen path (rejecting traversal and config dirs like `.obsidian`/
`.git`, and symlinks that escape the vault), fail rather than overwrite an existing note, and are
frontmatter-aware. There are **no ambient tools** — nothing runs unless you spread it into `tools`.

### Structured output — `agent.generate({ prompt, schema })`

Pass a JSON schema to get a validated object back:

```js
const { object } = await quickAddApi.ai.agent({ model: "gpt-5" }).generate({
prompt: "Extract the title and tags from the selection.",
schema: {
type: "object",
properties: { title: { type: "string" }, tags: { type: "array", items: { type: "string" } } },
required: ["title", "tags"],
},
});
// object => { title: "...", tags: ["...", "..."] }
```

`object` is the parsed, schema-validated result (or `undefined` if the model could not produce a
match after one repair attempt). Structured output works on current models — OpenAI GPT-5.x (and
GPT-4o-class), Anthropic Claude 4.x, and Gemini 3.x; it can be combined with tools. Older models
that do not support schema-constrained output (e.g. legacy OpenAI chat models) reject the request
outright with a provider error — use a current model rather than expecting a best-effort fallback.

:::note OpenAI reasoning models (GPT-5.x, o-series)
These accept only the default `temperature` (omit it from `modelOptions`), and QuickAdd
automatically sends `maxOutputTokens` as `max_completion_tokens` for them. The agent's default
path sets neither, so `quickAddApi.ai.agent({ model: "gpt-5" })` works as-is.
:::

### `getModels(): string[]`
Returns available AI models.

Expand Down
72 changes: 70 additions & 2 deletions src/ai/OpenAIRequest.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,7 @@ describe("OpenAIRequest", () => {
getModelProviderMock.mockReturnValue(anthropicProvider);
});

it("posts to /v1/messages with anthropic headers and a user-only message", async () => {
it("posts to /v1/messages with a top-level system prompt, model-aware max_tokens, and no stale beta header", async () => {
requestUrlMock.mockResolvedValue({
json: {
id: "msg-1",
Expand All @@ -391,21 +391,89 @@ describe("OpenAIRequest", () => {

const arg = requestUrlMock.mock.calls[0][0];
expect(arg.url).toBe("https://api.anthropic.com/v1/messages");
// Stale anthropic-beta header dropped.
expect(arg.headers).toEqual({
"Content-Type": "application/json",
"x-api-key": "anthropic-key",
"anthropic-version": "2023-06-01",
"anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15",
});

const body = JSON.parse(arg.body);
expect(body).toEqual({
model: "claude-3-5-sonnet",
// Conservative 4096 default (at/below every current Claude model's output cap).
max_tokens: 4096,
messages: [{ role: "user", content: "hello claude" }],
system: "system prompt",
});
});

it("omits the system key when no system prompt is given", async () => {
requestUrlMock.mockResolvedValue({
json: {
id: "msg-1b",
model: "claude-3-5-sonnet",
role: "assistant",
stop_reason: "end_turn",
stop_sequence: null,
type: "message",
content: [{ text: "ok", type: "text" }],
usage: { input_tokens: 1, output_tokens: 1 },
},
});

const makeRequest = OpenAIRequest(makeApp(), "anthropic-key", anthropicModel, "");
await makeRequest("hi");

const body = JSON.parse(requestUrlMock.mock.calls[0][0].body);
expect("system" in body).toBe(false);
expect(body.max_tokens).toBe(4096);
});

it("omits the system key for a whitespace-only system prompt", async () => {
requestUrlMock.mockResolvedValue({
json: {
id: "msg-1c",
model: "claude-3-5-sonnet",
role: "assistant",
stop_reason: "end_turn",
stop_sequence: null,
type: "message",
content: [{ text: "ok", type: "text" }],
usage: { input_tokens: 1, output_tokens: 1 },
},
});

const makeRequest = OpenAIRequest(makeApp(), "anthropic-key", anthropicModel, " ");
await makeRequest("hi");

const body = JSON.parse(requestUrlMock.mock.calls[0][0].body);
expect("system" in body).toBe(false);
});

it("extracts text by scanning all content blocks (a leading tool_use block does not break it)", async () => {
requestUrlMock.mockResolvedValue({
json: {
id: "msg-3",
model: "claude-3-5-sonnet",
role: "assistant",
stop_reason: "tool_use",
stop_sequence: null,
type: "message",
content: [
{ type: "tool_use", id: "toolu_1", name: "x", input: {} },
{ type: "text", text: "after the tool block" },
],
usage: { input_tokens: 3, output_tokens: 2 },
},
});

const makeRequest = OpenAIRequest(makeApp(), "anthropic-key", anthropicModel, "sys");
const result = await makeRequest("q");

expect(result.content).toBe("after the tool block");
});

it("maps the Anthropic response, summing tokens and preserving stop_sequence", async () => {
requestUrlMock.mockResolvedValue({
json: {
Expand Down
Loading