-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: add talos docs #2516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add talos docs #2516
Changes from all commits
19bb4bd
d181023
d42ba69
23db08b
34007cc
1fcf0dd
278c04c
8740078
0d10f17
5e684d5
dcef545
f173970
609a964
dd41c5e
95944bd
556092a
32825ab
938b510
86893a1
f5f5745
0049ef7
d963042
8539577
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,115 @@ | ||
| # Documentation Instructions | ||
|
|
||
| ## JSON Processing | ||
|
|
||
| Use `jq` instead of `python3` for all JSON operations in code examples: | ||
|
|
||
| - **Pretty-print:** `| jq .` not `| python3 -m json.tool` | ||
| - **Extract required fields:** `| jq -er '.field'` (the `-e` flag exits non-zero on `null` so `set -e` aborts the snippet instead | ||
| of silently exporting an empty value). | ||
| - **Extract optional fields:** `| jq -r '.field'` is fine when the field may legitimately be missing. | ||
|
|
||
| **Never write curl output to temporary files.** Capture responses in shell variables instead. File-based operations fail when | ||
| `/tmp` doesn't exist or isn't writable. | ||
|
|
||
| ## Passing state between doctest blocks | ||
|
|
||
| Doctest runs each code block in a fresh `bash -eu -o pipefail` subprocess and auto-captures the exported environment after each | ||
| successful block. To make a value available to the next block, just `export` it — no manual write to `$DOCTEST_ENV_FILE` is | ||
| needed. | ||
|
|
||
| ```bash | ||
| # Good: variable-based, exported for the next block, asserts the field is present | ||
| RESPONSE=$(curl -s -X POST "$URL/v2alpha1/admin/issuedApiKeys" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"name": "my-key"}') | ||
| echo "$RESPONSE" | jq . | ||
| export KEY_ID=$(echo "$RESPONSE" | jq -er '.key_id') | ||
|
|
||
| # Bad: file-based | ||
| curl -s ... -o /tmp/response.json | ||
| jq . /tmp/response.json | ||
| KEY_ID=$(jq -r '.key_id' /tmp/response.json) | ||
| rm -f /tmp/response.json | ||
|
|
||
| # Bad: redirecting to $DOCTEST_ENV_FILE (legacy; auto-capture handles this now) | ||
| KEY_ID=$(echo "$RESPONSE" | jq -r '.key_id') | ||
| echo "export KEY_ID=$KEY_ID" >> "$DOCTEST_ENV_FILE" | ||
| ``` | ||
|
|
||
| ## API Field Documentation | ||
|
|
||
| Integration guides under `integrate/` must NOT duplicate API field tables, error code tables, or enum tables. These are maintained | ||
| in the canonical references: | ||
|
|
||
| - **Field tables** -> auto-generated API reference at `reference/api/*.api.mdx` | ||
| - **Error codes** -> `reference/error-codes.md` | ||
|
|
||
| ### What belongs in integration guides | ||
|
|
||
| - **Workflow and examples**: curl commands, step-by-step instructions, the "how" and "why" | ||
| - **Brief inline mentions**: 1-3 sentences highlighting the most important fields (e.g., "The response includes a `secret` field | ||
| -- store it securely") | ||
| - **Conceptual comparisons**: tables comparing patterns, trade-offs, or usage scenarios (e.g., JWT vs macaroon) | ||
| - **Operational constraints**: limits, cache control headers, retry strategies | ||
| - **Links to reference**: always link to the canonical source for complete field/error details | ||
|
|
||
| ### What does NOT belong in integration guides | ||
|
|
||
| - Full request/response field tables (use API reference link instead) | ||
| - Error code enum tables (use error codes reference link instead) | ||
| - Query parameter tables (use API reference link instead) | ||
| - Revocation reason enum tables (use API reference link instead) | ||
|
|
||
| ### Link format | ||
|
|
||
| **All links MUST be relative links to markdown/mdx files with the file extension.** Never use absolute links (starting with `/`) | ||
| or links without a file extension. Hashbang anchors are allowed after the file extension. | ||
|
|
||
| - Links to `.md` files: `[text](../reference/error-codes.md#section)` | ||
| - Links to `.api.mdx` files: `[text](../reference/api/admin-issue-api-key.api.mdx)` | ||
| - Links to directory index pages: `[text](../operate/cache/index.md)` (never `../operate/cache/`) | ||
| - Links within the same directory: `[text](./sibling-page.md)` | ||
|
|
||
| ```text | ||
| # Good: relative links with file extensions | ||
| For the complete field reference, see the [IssueAPIKey API reference](../reference/api/admin-issue-api-key.api.mdx). | ||
| For the full list of error codes, see the [error codes reference](../reference/error-codes.md#verification-error-codes). | ||
|
|
||
| # Bad: absolute links without file extensions | ||
| For the complete field reference, see the [IssueAPIKey API reference](/reference/api/admin-issue-api-key). | ||
| For the full list of error codes, see the [error codes reference](/reference/error-codes#verification-error-codes). | ||
| ``` | ||
|
|
||
| ### API reference URL pattern | ||
|
|
||
| API reference pages are `.api.mdx` files at `reference/api/{plane}-{method}.api.mdx` where: | ||
|
|
||
| - `{plane}` is `admin` or `data` | ||
| - `{method}` is the kebab-case method name (e.g., `issue-api-key`, `verify-api-key`) | ||
|
|
||
| The API overview page is `reference/api/ory-talos-api.info.mdx`. | ||
|
|
||
| ### Notes and callouts | ||
|
|
||
| Ensure that notes / callouts have two line breaks, or they will get formatted incorrectly. | ||
|
|
||
| **Incorrect:** | ||
|
|
||
| ```md | ||
| :::note Internal package The Go client is in an `internal/` package and cannot be imported by external Go modules. ::: | ||
| ``` | ||
|
|
||
| ```md | ||
| :::note Internal package The Go client is in an `internal/` package and cannot be imported by external Go modules. ::: | ||
| ``` | ||
|
|
||
| Correct: | ||
|
|
||
| ```md | ||
| :::note | ||
|
|
||
| Internal package The Go client is in an `internal/` package and cannot be imported by external Go modules. | ||
|
|
||
| ::: | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,215 @@ | ||||||
| --- | ||||||
| id: talos-architecture | ||||||
| title: Ory Talos architecture | ||||||
| sidebar_label: Ory Talso architecture | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix typo in sidebar label ( This shows up in navigation and looks unpolished. Suggested fix-sidebar_label: Ory Talso architecture
+sidebar_label: Ory Talos architecture📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
| --- | ||||||
|
|
||||||
| # Architecture | ||||||
|
|
||||||
| Talos separates API key management into two planes. | ||||||
|
|
||||||
| ## Admin plane | ||||||
|
|
||||||
| The admin plane handles all key management and verification operations: key issuance, rotation, revocation, token derivation, | ||||||
| JWKS, and verification (single and batch). It is exposed only to internal services and clients with admin credentials. | ||||||
|
|
||||||
| Endpoints: `/v2alpha1/admin/`, including `/v2alpha1/admin/apiKeys:verify` and `/v2alpha1/admin/apiKeys:batchVerify`. | ||||||
|
|
||||||
| For low-latency verification close to clients, deploy the commercial [edge proxy](../operate/deploy/edge-proxy.md) as a sidecar. | ||||||
| The proxy caches admin verify responses locally, so applications get sub-millisecond cache hits without exposing the admin plane | ||||||
| publicly. | ||||||
|
|
||||||
| ## Data plane | ||||||
|
|
||||||
| The data plane handles self-service operations that credential holders perform with proof of possession of the credential itself, | ||||||
| no admin authentication required. | ||||||
|
|
||||||
| Endpoints: `POST /v2alpha1/apiKeys:selfRevoke` | ||||||
|
|
||||||
| ## Verification flow | ||||||
|
|
||||||
| ``` | ||||||
| Client --> Verifier --> Cache (hit?) --> Database --> Response | ||||||
| | ^ | ||||||
| +-- cache hit ---------------+ | ||||||
| ``` | ||||||
|
|
||||||
| 1. Client sends credential to `POST /v2alpha1/admin/apiKeys:verify` | ||||||
| 2. Talos identifies the credential type (generated, imported, JWT, macaroon) | ||||||
| 3. For generated keys, the UUID is extracted from the token identifier | ||||||
| 4. For imported keys, a tenant-scoped SHA-512/256 hash is computed | ||||||
| 5. Database lookup (or cache hit) returns key metadata | ||||||
| 6. Response includes key status, owner, scopes, and metadata | ||||||
|
|
||||||
| ## Deployment topologies | ||||||
|
|
||||||
| | Topology | Edition | Description | | ||||||
| | ------------ | ---------- | -------------------------------------------------------------------- | | ||||||
| | Single-node | OSS | One process serves both planes | | ||||||
| | Split planes | Commercial | Admin and data planes as separate deployments | | ||||||
| | Edge proxy | Commercial | Sidecar proxy at the edge that caches admin verify responses locally | | ||||||
|
|
||||||
| Both planes share the same database. Verification uses caching (memory or Redis) to minimize database load. | ||||||
|
|
||||||
| ## Ports | ||||||
|
|
||||||
| | Port | Purpose | | ||||||
| | ---- | ------------------ | | ||||||
| | 4420 | HTTP API (default) | | ||||||
| | 4422 | Prometheus metrics | | ||||||
|
|
||||||
| ## Design philosophy | ||||||
|
|
||||||
| ### Separation of concerns | ||||||
|
|
||||||
| The system is divided into distinct layers: | ||||||
|
|
||||||
| - **Admin plane**: Management operations (CRUD for keys, rotation, import, token derivation) | ||||||
| - **Data plane**: High-throughput verification operations | ||||||
| - **Persistence layer**: Database abstraction with pluggable drivers | ||||||
| - **Cache layer**: Performance optimization with multiple backends | ||||||
|
|
||||||
| This separation allows independent scaling of components, different SLOs for different operations (admin targets \<100ms p99, data | ||||||
| plane targets \<3ms p99), and clear boundaries between responsibilities. | ||||||
|
|
||||||
| ### Production-first design | ||||||
|
|
||||||
| - Hard isolation between admin and data operations | ||||||
| - Metrics, traces, and structured logs are emitted by default | ||||||
| - Graceful degradation when the database or cache backend is unavailable | ||||||
| - Zero-downtime deployments via rolling updates and stateless verification | ||||||
|
|
||||||
| ### Performance characteristics | ||||||
|
|
||||||
| - Self-contained tokens (JWT/macaroon) enable stateless verification | ||||||
| - HMAC-SHA256 keeps the revocation check on the order of microseconds; bcrypt would cap a single core at roughly 10 verifications | ||||||
| per second | ||||||
| - LRU caching for hot paths | ||||||
| - Minimal allocations in the verification path | ||||||
|
|
||||||
| ## System architecture | ||||||
|
|
||||||
| ``` | ||||||
| Clients (CLI, SDK, HTTP) | ||||||
| | | ||||||
| v | ||||||
| +----------------------------------+ | ||||||
| | HTTP Server (grpc-gateway) | | ||||||
| | Port: 4420 | | ||||||
| +----------------------------------+ | ||||||
| | | ||||||
| v | ||||||
| +----------------------------------+ | ||||||
| | Middleware | | ||||||
| | Logging, Metrics, Tracing | | ||||||
| +----------------------------------+ | ||||||
| | | ||||||
| +-----+----------+ | ||||||
| | | | ||||||
| v v | ||||||
| +-----------+ +-----------+ | ||||||
| | Admin | | Data | | ||||||
| | Plane | | Plane | | ||||||
| | <100ms | | <3ms p99 | | ||||||
| +-----------+ +-----------+ | ||||||
| | | | ||||||
| v v | ||||||
| +----------------------------------+ | ||||||
| | Service Layer | | ||||||
| | Business logic, Validation | | ||||||
| +----------------------------------+ | ||||||
| | | ||||||
| +-----+----------+ | ||||||
| | | | ||||||
| v v | ||||||
| +-----------+ +-----------+ | ||||||
| | Persist. | | Cache | | ||||||
| | SQLite | | Memory | | ||||||
| | PG/MySQL | | LRU | | ||||||
| | CRDB | | Redis | | ||||||
| +-----------+ +-----------+ | ||||||
| ``` | ||||||
|
|
||||||
| All requests enter through a single HTTP server built on grpc-gateway (port 4420) and pass through middleware for logging, | ||||||
| metrics, and tracing before being routed to the appropriate plane. | ||||||
|
|
||||||
| ## Component overview | ||||||
|
|
||||||
| ### HTTP server | ||||||
|
|
||||||
| The API layer uses grpc-gateway for HTTP/JSON routing with protobuf-based schemas. It serves both planes through a single port, | ||||||
| handles CORS and compression, and exposes OpenAPI documentation. | ||||||
|
|
||||||
| ### Service layer | ||||||
|
|
||||||
| Business logic is split between the admin plane service (key lifecycle, import, token derivation, input validation) and the data | ||||||
| plane verifier (token parsing, signature verification, revocation checking, cache management). The verifier is optimized for the | ||||||
| hot path with minimal allocations. | ||||||
|
|
||||||
| ### Persistence | ||||||
|
|
||||||
| Database access uses sqlc-generated type-safe queries with pluggable drivers: | ||||||
|
|
||||||
| - **SQLite** -- OSS edition, zero-config, suitable for millions of keys | ||||||
| - **PostgreSQL** -- production workloads | ||||||
| - **MySQL** -- production workloads | ||||||
| - **CockroachDB** -- distributed deployments | ||||||
|
|
||||||
| Schema changes are managed through versioned migrations using golang-migrate. | ||||||
|
|
||||||
| ### Cache | ||||||
|
|
||||||
| The cache layer reduces database load on the verification path: | ||||||
|
|
||||||
| - **Memory LRU** (OSS) -- local to each instance, configurable size limits | ||||||
| - **Redis** (Commercial) -- distributed, supports cluster and sentinel modes | ||||||
| - **Hierarchical L1+L2** (Commercial) -- memory for speed, Redis for shared state | ||||||
|
|
||||||
| ### Crypto | ||||||
|
|
||||||
| Talos supports multiple JWT signing algorithms and a separate API key hashing mechanism: | ||||||
|
|
||||||
| - **JWT signing algorithms** | ||||||
| - `Ed25519 (EdDSA)` -- default, fastest signing and smallest keys | ||||||
| - `RSA-2048/4096 (RS256)` -- legacy compatibility | ||||||
| - **API key hashing** | ||||||
| - `HMAC-SHA256` -- used for API key revocation checks (\<1ms with constant-time comparison) | ||||||
|
|
||||||
| The JWT signing algorithm is determined per JWK by its `alg` field, so one JWKS can contain keys for multiple signing algorithms | ||||||
| at the same time. | ||||||
|
|
||||||
| ### Observability | ||||||
|
|
||||||
| Built-in instrumentation across three pillars: | ||||||
|
|
||||||
| - **Metrics** -- Prometheus exposition on port 4422 with request latency histograms and error rate counters | ||||||
| - **Tracing** -- OpenTelemetry with W3C Trace Context propagation, configurable sampling, OTLP and Jaeger exporters | ||||||
| - **Logging** -- structured JSON logging via slog with correlation IDs and contextual fields | ||||||
|
|
||||||
| ## Scalability | ||||||
|
|
||||||
| ### Small (\<1k RPS) | ||||||
|
|
||||||
| A single Talos instance handles both planes with SQLite and an in-memory LRU cache. No external dependencies required. | ||||||
|
|
||||||
| - OSS edition sufficient | ||||||
| - 1 CPU, 512MB RAM | ||||||
| - Cost: $5-10/month | ||||||
|
|
||||||
| ### Medium (10-50k RPS) | ||||||
|
|
||||||
| Separate admin and data plane deployments behind a load balancer. PostgreSQL replaces SQLite for durability. Redis provides shared | ||||||
| caching across data plane instances. | ||||||
|
|
||||||
| - Commercial edition | ||||||
| - Auto-scaling for data plane | ||||||
| - Cost: $100-500/month | ||||||
|
|
||||||
| ### Large (200k+ RPS) | ||||||
|
|
||||||
| A cluster of 10-50+ stateless data plane instances with auto-scaling, backed by a distributed Redis cache and PostgreSQL with read | ||||||
| replicas and connection pooling. Supports multi-region deployment. | ||||||
|
|
||||||
| - Commercial edition | ||||||
| - Regional data plane deployment | ||||||
| - Cost: $1-5k/month | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| --- | ||
| id: caching-consistency | ||
| title: Ory Talos caching and consistency | ||
| sidebar_label: Caching and consistency | ||
| --- | ||
|
|
||
| # Caching and consistency | ||
|
|
||
| Talos caches verification results to reduce database load and improve latency. The OSS edition ships a no-op cache; in-memory and | ||
| Redis backends are commercial-only — see [Caching](../operate/cache/index.md) for backend selection. | ||
|
|
||
| ## How it works | ||
|
|
||
| When caching is enabled, the first verification request for a key hits the database. Subsequent requests within the cache TTL are | ||
| served from cache without a database lookup. | ||
|
|
||
| ## Cache types | ||
|
|
||
| | Type | Scope | Use case | | ||
| | ------ | ----------- | ----------------------------------- | | ||
| | Memory | Per-process | Single node or per-instance caching | | ||
| | Redis | Shared | Multi-instance deployments | | ||
|
|
||
| ## Eventual consistency | ||
|
|
||
| Caching introduces eventual consistency for revocation: | ||
|
|
||
| 1. Admin revokes a key via `POST /v2alpha1/admin/apiKeys/{key_id}:revoke` | ||
| 2. The revocation takes effect in the database immediately | ||
| 3. Cached verification results for that key remain valid until the cache entry expires | ||
| 4. After TTL expiry, the next verification hits the database and returns `is_active: false` | ||
|
|
||
| ## Cache bypass | ||
|
|
||
| To force a database lookup (bypassing cache), include the `Cache-Control: no-cache` header: | ||
|
|
||
| ```bash | ||
| curl -X POST http://localhost:4420/v2alpha1/admin/apiKeys:verify \ | ||
| -H "Content-Type: application/json" \ | ||
| -H "Cache-Control: no-cache" \ | ||
| -d '{"credential": "..."}' | ||
| ``` | ||
|
|
||
| See the [quickstart revocation check](../quickstart/index.mdx) and the [curl SDK reference](../integrate/sdk/curl.md) for tested | ||
| examples using cache bypass. | ||
|
|
||
| ## TTL guidelines | ||
|
|
||
| | TTL | Trade-off | | ||
| | ----- | ------------------------------------------------- | | ||
| | `1m` | Fast revocation propagation, higher database load | | ||
| | `5m` | Balanced (recommended default) | | ||
| | `30m` | Low database load, slower revocation propagation | | ||
|
|
||
| See [Cache operations guide](../operate/cache/index.md) for configuration details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate “Incorrect” callout example creates ambiguity.
There are two identical “Incorrect” examples in a row, which makes the before/after contrast harder to follow. Keep a single incorrect block, then the corrected block.
Suggested edit
-
md -:::note Internal package The Go client is in an `internal/` package and cannot be imported by external Go modules. ::: -Correct:
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
@docs/talos/CLAUDE.mdaround lines 97 - 105, Remove the duplicated"Incorrect" callout block so there's only one instance of the markdown snippet
":::note Internal package The Go client is in an
internal/package and cannotbe imported by external Go modules. :::" followed immediately by the corrected
"Correct:" block; specifically, delete the second identical
md ... :::block and ensure the document flows: single Incorrect example, then the Correct
example.