feat: plugin architecture for fctl (Ledger v3 POC)#156
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (6)
📒 Files selected for processing (24)
WalkthroughThis PR introduces a comprehensive plugin system for fctl that enables dynamic loading and execution of CLI plugins. It defines a gRPC-based plugin contract, implements registry-based plugin discovery with semver compatibility resolution, provides manifest caching and lazy process loading, integrates plugins into Cobra CLI commands with auth context bridging, and adds install/list/remove management commands alongside automatic interactive discovery. ChangesPlugin System Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes The change introduces a multi-layer plugin subsystem with diverse components (gRPC, process management, configuration, CLI integration, rendering) spanning protobuf, Go SDK, registry logic, caching, and Cobra wiring. While individual layers are moderately complex, the interconnected nature and number of moving parts (contract→config→registry→loading→management→resolution→execution→rendering) requires careful review of both isolated logic and integration points. Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 16
🧹 Nitpick comments (3)
cmd/root.go (2)
204-207: ⚡ Quick winAvoid shadowing the imported
runtime/debugpackage.The local variable
debugon line 204 shadows thedebugpackage imported on line 10 and used asdebug.PrintStack()in the panic-recovery deferred function on line 192. Today this works only because the deferred function is lexically declared before the variable comes into scope, but it is a foot-gun: anyone adding adebug.PrintStack()(or similar) anywhere after line 204 will silently get a type error from aboolvariable. Rename for clarity.♻️ Proposed rename
- debug := fctl.GetBool(cmd, fctl.DebugFlag) - pm := pluginpkg.NewPluginManager(configDir, debug) + debugEnabled := fctl.GetBool(cmd, fctl.DebugFlag) + pm := pluginpkg.NewPluginManager(configDir, debugEnabled) pm.DiscoverAndLoad(ctx) defer pm.Shutdown() // Handle --plugin-binary for ephemeral one-shot plugin loading if pluginBinary := fctl.GetString(cmd, "plugin-binary"); pluginBinary != "" { if name, binaryPath, ok := strings.Cut(pluginBinary, "="); ok && name != "" && binaryPath != "" { var opts []pluginpkg.LoadPluginOption - if debug { + if debugEnabled { opts = append(opts, pluginpkg.WithDebug()) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/root.go` around lines 204 - 207, The local variable named debug returned from fctl.GetBool(cmd, fctl.DebugFlag) shadows the imported runtime/debug package used by the deferred panic-recovery (debug.PrintStack()); rename the local boolean to something unambiguous (e.g., debugFlag or isDebug) and update its use when constructing the plugin manager (pluginpkg.NewPluginManager(configDir, <newName>)) so pm.DiscoverAndLoad(ctx) and defer pm.Shutdown() keep the same behavior but no longer shadow the imported debug package.
113-128: 💤 Low value
serviceCommandscovers onlyledger; consider documenting the rollout plan or a TODO.The map currently has a single entry. Given the PR's phased rollout (Phase 9 is Ledger v3-only), this is intentional, but a comment explaining the policy ("add an entry here as each v3 service ships a plugin") would help future contributors avoid wondering whether other services are missing by accident.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/root.go` around lines 113 - 128, serviceCommands currently only contains the "ledger" entry which is intentional for the phased rollout; add a concise comment/TODO above the serviceCommands map explaining the policy (e.g., "Only v3 services will be added here as they ship plugins — add service entries when shipping v3 plugin") so future contributors understand to add new entries (reference: the serviceCommands map and its "ledger" builtInCovers function). Keep the note brief and include the rollout phase guidance and when to add new map entries.cmd/plugin/install.go (1)
87-94: 💤 Low valueStream the binary instead of loading it entirely into memory.
os.ReadFilematerializes the entire plugin binary in memory beforeos.WriteFilerewrites it. For a CLI plugin this is usually fine, butio.Copybetween two*os.Files is simpler, allocation-free, and removes any practical size ceiling.♻️ Proposed change
- data, err := os.ReadFile(absPath) - if err != nil { - return fmt.Errorf("failed to read binary: %w", err) - } - - if err := os.WriteFile(destPath, data, 0o755); err != nil { - return fmt.Errorf("failed to write plugin binary: %w", err) - } + src, err := os.Open(absPath) + if err != nil { + return fmt.Errorf("failed to read binary: %w", err) + } + defer src.Close() + + dst, err := os.OpenFile(destPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o755) + if err != nil { + return fmt.Errorf("failed to write plugin binary: %w", err) + } + if _, err := io.Copy(dst, src); err != nil { + dst.Close() + return fmt.Errorf("failed to write plugin binary: %w", err) + } + if err := dst.Close(); err != nil { + return fmt.Errorf("failed to write plugin binary: %w", err) + }Add
"io"to the imports.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/plugin/install.go` around lines 87 - 94, The code currently uses os.ReadFile and os.WriteFile to copy the plugin binary via memory; change this to stream the copy with io.Copy to avoid allocating the whole file. Open the source with os.Open(absPath), open/create the dest with os.OpenFile(destPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0o755), defer Close() both, then call io.Copy(destFile, srcFile) and return any error from Open or Copy; remove the ReadFile/WriteFile usage and add "io" to imports. Ensure you propagate errors and preserve the 0755 mode when creating the destination file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/root.go`:
- Around line 166-169: The code currently swallows errors from pluginpkg.Resolve
(called with serviceName, serviceVersion, pm, registry,
builtInCovers(serviceVersion)) by returning nil, hiding real
configuration/resolve bugs; change this to surface or propagate the error
instead of returning nil — either return the error up the call chain (propagate)
or at minimum log it (respecting --debug) via the existing logger and then
return the error so callers can handle it; locate the call to pluginpkg.Resolve
and replace the silent return with error propagation or a debug/error log plus
returning the error.
In `@pkg/plugin/autodiscovery.go`:
- Around line 40-43: The interactive confirmation currently ignores the error
from pterm.DefaultInteractiveConfirm.WithDefaultValue(true).Show causing
TTY/input failures to be treated as successful; change the call to capture the
(result, err) return values, check err explicitly (e.g. return or propagate a
descriptive error when err != nil) and only proceed based on the boolean result
when err is nil—update the call site in autodiscovery.go where result is
assigned to use the new error handling.
In `@pkg/plugin/cache_test.go`:
- Around line 77-86: The stale-cache test in pkg/plugin/cache_test.go can fail
due to coarse filesystem timestamp granularity; before/after rewriting the
binary at binaryPath ensure the file mtime actually changes by either (a)
calling os.Chtimes(binaryPath, time.Now().Add(1*time.Second),
time.Now().Add(1*time.Second)) after writing or (b) inserting a short sleep
(e.g., time.Sleep(1*time.Second)) before rewriting, then rewrite the file and
assert LoadCachedManifest(tmpDir, name, version) returns an error as
before—update the test around the binaryPath write and the LoadCachedManifest
call to apply one of these approaches so the modified binary has a newer
modtime.
In `@pkg/plugin/cache.go`:
- Around line 44-46: The current freshness check using info.ModTime().UnixNano()
== cached.BinaryModTime is fragile; update the cache validation to persist and
compare a stronger signal (e.g., file size and a content hash like SHA256) when
saving and loading manifests: on save compute and store BinarySize and
BinaryHash alongside BinaryModTime, and on load recompute the file size and
SHA256 of the binary and reject the cache if either size or hash differs (keep
modtime as an additional hint but not the sole check); update all places that
compare cached.BinaryModTime (including the other check referenced around lines
70–73) to use the new size+hash comparison (use the existing cached struct
fields or add BinarySize/BinaryHash fields).
In `@pkg/plugin/cobra.go`:
- Around line 20-21: The manifest RPCs currently call loaded.Client.GetManifest
with context.Background(), which can hang indefinitely; update both call sites
that invoke loaded.Client.GetManifest to create a context with a timeout (use
context.WithTimeout and a sensible duration), pass that context into
GetManifest, and defer the cancel() to ensure the RPC is bounded; make the same
change for any other plugin RPCs in the file that use context.Background() (the
second occurrence referenced) so all manifest fetches use the cancellable
timeout context instead of context.Background().
- Around line 73-75: The current code uses commandPath := spec.Use which only
captures the local token and can collide for nested commands; change it to
capture the full, unambiguous command path from the constructed cobra Command
(e.g., use command.CommandPath() or command.CommandPath() equivalent) before
passing into makePluginRunE so the plugin exec receives the complete path; apply
the same change at the other occurrence (the block around lines 107-108) where
spec.Use is used.
- Around line 218-223: When in plain mode (display == nil and outputFormat !=
"json") and success.RenderedText is empty but success.JsonData is present, print
the JsonData as a fallback instead of returning nil silently; modify the block
around success.RenderedText in the function that writes to cmd.OutOrStdout() so
that if success.RenderedText == "" && success.JsonData != "" (and display == nil
&& outputFormat != "json") you call fmt.Fprint(cmd.OutOrStdout(),
success.JsonData) and return any error from that write (use the same
cmd.OutOrStdout() path and error handling as the existing fmt.Fprint branch for
success.RenderedText).
- Around line 276-293: The code currently coerces invalid numeric constraints
(from parts[1], parts[2]) into zero by ignoring strconv.Atoi errors; instead,
check the error returned by strconv.Atoi when parsing for ExactArgs,
MinimumNArgs, MaximumNArgs and the two parses for RangeArgs and if parsing fails
do not construct the cobra validator (i.e. avoid calling
fctl.WithArgs(cobra.ExactArgs/MinimumNArgs/MaximumNArgs/RangeArgs) with a zero),
either return no args constraint or propagate an explicit error; update the
branches that reference parts, fctl.WithArgs and
cobra.ExactArgs/MinimumNArgs/MaximumNArgs/RangeArgs to validate atoi errors
before using parsed values.
In `@pkg/plugin/config.go`:
- Around line 66-68: PluginBinaryPath currently joins unvalidated name and
version into a filesystem path and can be escaped with crafted segments; update
PluginBinaryPath to validate/sanitize both name and version before joining:
ensure filepath.Base(name) == name and filepath.Base(version) == version, and
that neither is empty nor equals "." or ".." and does not contain path
separators (os.PathSeparator or '/'); if validation fails return a safe fallback
(e.g. empty string) or propagate an error (adjust signature), then call
PluginsDir(configDir) and filepath.Join using the validated values to build the
final path.
In `@pkg/plugin/debug.go`:
- Around line 41-44: The Write method in the prefixed stderr forwarder
(pkg/plugin/debug.go — function Write using w.out and w.prefix) currently
ignores fmt.Fprintf errors by always returning (len(p), nil); change it to
capture the error from fmt.Fprintf (or any write to w.out), and return (0, err)
or (n, err) according to the writer contract when a write fails so callers see
I/O errors; ensure successful writes still return the number of bytes written
and nil error.
In `@pkg/plugin/loader.go`:
- Around line 155-162: The ensureLoaded path currently discards loaded.stderr
after calling LoadPlugin (called in the block that sets l.inner and l.client),
so preserve the plugin process stderr by storing loaded.stderr onto the loader
struct (e.g., set l.stderr or similar) instead of dropping it, and add a getter
method (e.g., Stderr() or LastStderr()) on the loader to expose that captured
stderr for higher-level crash formatting; update any usage sites that attribute
crashes to read the new accessor rather than assuming stderr was available.
In `@pkg/plugin/manager.go`:
- Around line 64-85: The code currently spawns a plugin via LoadPlugin to fetch
the manifest and then keeps the spawned process in pm.loaded (the variable
loaded), causing unnecessary long-lived processes; change the cache-miss path so
after calling loaded.Client.GetManifest(ctx) and
SaveCachedManifest(pm.configDir, p.Name, version, manifest) you call
loaded.Kill() to terminate the spawned process and instead append a
LazyPluginClient (or equivalent lazy wrapper) to pm.loaded that holds the plugin
metadata (Version, CompatibleWith, manifest location) and will spawn the real
plugin on demand; update places that expect concrete plugin instances to accept
the lazy wrapper (refer to LoadPlugin, loaded.Client.GetManifest, loaded.Kill,
SaveCachedManifest, pm.loaded, and LazyPluginClient) so manifest warm-up only
caches data and does not retain processes.
- Around line 182-184: The code currently ignores the result of os.RemoveAll
when deleting the plugin directory (pluginDir :=
filepath.Join(PluginsDir(pm.configDir), name); _ = os.RemoveAll(pluginDir));
change this to capture and handle the error from os.RemoveAll(pluginDir) and
abort or return the error before you mutate the plugin config/state.
Specifically, in the function that calls os.RemoveAll (referencing PluginsDir,
pm.configDir and name), replace the ignored assignment with proper error
checking: if os.RemoveAll returns an error, log/return that error (or wrap it)
so the config update that follows does not proceed when disk deletion failed.
In `@pkg/plugin/registry.go`:
- Around line 92-95: The registry client currently performs HTTP GETs (via
r.HTTPClient.Get in RegistryClient and callers using pkg/http.go:GetHttpClient)
without a timeout; update NewRegistryClient to set a sensible
http.Client.Timeout when httpClient.Timeout == 0 (preserving
httpClient.Transport/round-tripper) so registry fetches and binary downloads
cannot hang, and optionally update registry call sites to use context-aware
requests (e.g., use http.NewRequestWithContext) for better cancellation control.
In `@pkg/plugin/versions.go`:
- Around line 17-19: DetectServiceVersions currently dereferences stackClient by
calling stackClient.GetVersions which will panic if stackClient is nil; add a
defensive nil check at the start of DetectServiceVersions to return a clear
error when stackClient == nil (before calling GetVersions). Update the function
(DetectServiceVersions) to validate stackClient, return an appropriate error
value (and zero ServiceVersions) if nil, and only call stackClient.GetVersions
when the client is non-nil.
In `@pkg/pluginsdk/sdk.go`:
- Around line 93-101: Serve currently accepts a nil FctlPlugin and will later
dereference it via GRPCPlugin{Impl: impl}; add a guard at the start of Serve
that checks if impl == nil and fail fast (panic or log.Fatalf) with a clear
message like "Serve called with nil FctlPlugin implementation" so we never
construct GRPCPlugin with a nil Impl or proceed to goplugin.Serve; reference the
Serve function, the FctlPlugin parameter, and the GRPCPlugin{Impl: impl} usage
when making the change.
---
Nitpick comments:
In `@cmd/plugin/install.go`:
- Around line 87-94: The code currently uses os.ReadFile and os.WriteFile to
copy the plugin binary via memory; change this to stream the copy with io.Copy
to avoid allocating the whole file. Open the source with os.Open(absPath),
open/create the dest with os.OpenFile(destPath,
os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0o755), defer Close() both, then call
io.Copy(destFile, srcFile) and return any error from Open or Copy; remove the
ReadFile/WriteFile usage and add "io" to imports. Ensure you propagate errors
and preserve the 0755 mode when creating the destination file.
In `@cmd/root.go`:
- Around line 204-207: The local variable named debug returned from
fctl.GetBool(cmd, fctl.DebugFlag) shadows the imported runtime/debug package
used by the deferred panic-recovery (debug.PrintStack()); rename the local
boolean to something unambiguous (e.g., debugFlag or isDebug) and update its use
when constructing the plugin manager (pluginpkg.NewPluginManager(configDir,
<newName>)) so pm.DiscoverAndLoad(ctx) and defer pm.Shutdown() keep the same
behavior but no longer shadow the imported debug package.
- Around line 113-128: serviceCommands currently only contains the "ledger"
entry which is intentional for the phased rollout; add a concise comment/TODO
above the serviceCommands map explaining the policy (e.g., "Only v3 services
will be added here as they ship plugins — add service entries when shipping v3
plugin") so future contributors understand to add new entries (reference: the
serviceCommands map and its "ledger" builtInCovers function). Keep the note
brief and include the rollout phase guidance and when to add new map entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e748d5e7-19c7-43a3-9449-a05b085bcec4
⛔ Files ignored due to path filters (6)
go.modis excluded by!**/*.modgo.sumis excluded by!**/*.sum,!**/*.sumpkg/pluginsdk/go.modis excluded by!**/*.modpkg/pluginsdk/go.sumis excluded by!**/*.sum,!**/*.sumpkg/pluginsdk/pluginpb/plugin.pb.gois excluded by!**/*.pb.go,!**/*.pb.gopkg/pluginsdk/pluginpb/plugin_grpc.pb.gois excluded by!**/*.pb.go,!**/*.pb.go
📒 Files selected for processing (24)
.gitignorecmd/plugin/install.gocmd/plugin/list.gocmd/plugin/remove.gocmd/plugin/root.gocmd/root.gopkg/plugin/autodiscovery.gopkg/plugin/cache.gopkg/plugin/cache_test.gopkg/plugin/cobra.gopkg/plugin/config.gopkg/plugin/config_test.gopkg/plugin/debug.gopkg/plugin/loader.gopkg/plugin/manager.gopkg/plugin/registry.gopkg/plugin/registry_test.gopkg/plugin/render.gopkg/plugin/render_test.gopkg/plugin/resolver.gopkg/plugin/resolver_test.gopkg/plugin/versions.gopkg/pluginsdk/plugin.protopkg/pluginsdk/sdk.go
| resolution, err := pluginpkg.Resolve(serviceName, serviceVersion, pm, registry, builtInCovers(serviceVersion)) | ||
| if err != nil { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Don't silently swallow Resolve errors.
Auth/stack-client/version-detection failures are reasonably swallowed (the comment explains the intent: defer to the built-in when we can't talk to the stack). But pluginpkg.Resolve is a pure local decision once serviceVersion is known — an error here typically indicates a real bug (e.g., malformed installed plugin config) and should surface, at least when --debug is enabled. Returning nil makes such bugs invisible.
🔍 Suggested handling
resolution, err := pluginpkg.Resolve(serviceName, serviceVersion, pm, registry, builtInCovers(serviceVersion))
if err != nil {
- return nil
+ if fctl.GetBool(cmd, fctl.DebugFlag) {
+ logging.FromContext(cmd.Context()).Errorf("plugin resolution failed for %s@%s: %v", serviceName, serviceVersion, err)
+ }
+ return nil
}Or, if Resolve is expected to be infallible for valid inputs, propagate the error upward instead.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resolution, err := pluginpkg.Resolve(serviceName, serviceVersion, pm, registry, builtInCovers(serviceVersion)) | |
| if err != nil { | |
| return nil | |
| } | |
| resolution, err := pluginpkg.Resolve(serviceName, serviceVersion, pm, registry, builtInCovers(serviceVersion)) | |
| if err != nil { | |
| if fctl.GetBool(cmd, fctl.DebugFlag) { | |
| logging.FromContext(cmd.Context()).Errorf("plugin resolution failed for %s@%s: %v", serviceName, serviceVersion, err) | |
| } | |
| return nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/root.go` around lines 166 - 169, The code currently swallows errors from
pluginpkg.Resolve (called with serviceName, serviceVersion, pm, registry,
builtInCovers(serviceVersion)) by returning nil, hiding real
configuration/resolve bugs; change this to surface or propagate the error
instead of returning nil — either return the error up the call chain (propagate)
or at minimum log it (respecting --debug) via the existing logger and then
return the error so callers can handle it; locate the call to pluginpkg.Resolve
and replace the silent return with error propagation or a debug/error log plus
returning the error.
| result, _ := pterm.DefaultInteractiveConfirm. | ||
| WithDefaultValue(true). | ||
| Show("Install it now?") | ||
|
|
There was a problem hiding this comment.
Handle interactive confirmation errors explicitly.
The prompt error is ignored; input/TTY failures should return a clear error instead of proceeding with ambiguous state.
Proposed fix
-result, _ := pterm.DefaultInteractiveConfirm.
+result, err := pterm.DefaultInteractiveConfirm.
WithDefaultValue(true).
Show("Install it now?")
+if err != nil {
+ return nil, fmt.Errorf("failed to read install confirmation: %w", err)
+}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/plugin/autodiscovery.go` around lines 40 - 43, The interactive
confirmation currently ignores the error from
pterm.DefaultInteractiveConfirm.WithDefaultValue(true).Show causing TTY/input
failures to be treated as successful; change the call to capture the (result,
err) return values, check err explicitly (e.g. return or propagate a descriptive
error when err != nil) and only proceed based on the boolean result when err is
nil—update the call site in autodiscovery.go where result is assigned to use the
new error handling.
| // Modify the binary | ||
| if err := os.WriteFile(binaryPath, []byte("v2-modified"), 0o755); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
|
|
||
| // Cache should be stale | ||
| _, err := LoadCachedManifest(tmpDir, name, version) | ||
| if err == nil { | ||
| t.Fatal("expected stale cache error after binary modification") | ||
| } |
There was a problem hiding this comment.
Make stale-cache test resilient to filesystem timestamp granularity.
This can intermittently fail on coarse-grained filesystems because binary rewrites may keep the same modtime. Force a timestamp change (e.g., os.Chtimes) or add a short delay before rewriting.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/plugin/cache_test.go` around lines 77 - 86, The stale-cache test in
pkg/plugin/cache_test.go can fail due to coarse filesystem timestamp
granularity; before/after rewriting the binary at binaryPath ensure the file
mtime actually changes by either (a) calling os.Chtimes(binaryPath,
time.Now().Add(1*time.Second), time.Now().Add(1*time.Second)) after writing or
(b) inserting a short sleep (e.g., time.Sleep(1*time.Second)) before rewriting,
then rewrite the file and assert LoadCachedManifest(tmpDir, name, version)
returns an error as before—update the test around the binaryPath write and the
LoadCachedManifest call to apply one of these approaches so the modified binary
has a newer modtime.
| if info.ModTime().UnixNano() != cached.BinaryModTime { | ||
| return nil, fmt.Errorf("manifest cache is stale") | ||
| } |
There was a problem hiding this comment.
Use a stronger cache freshness signal than modtime equality.
Exact UnixNano equality can miss binary changes on filesystems with coarse timestamp resolution, causing stale manifests to be treated as fresh. Consider persisting a content hash (or at least size+modtime) and validating that instead.
Suggested direction
type CachedManifest struct {
BinaryModTime int64 `json:"binaryModTime"`
+ BinarySize int64 `json:"binarySize"`
+ BinarySHA256 string `json:"binarySha256"`
ManifestJSON []byte `json:"manifest"`
}Then compute and compare size/hash on save/load.
Also applies to: 70-73
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/plugin/cache.go` around lines 44 - 46, The current freshness check using
info.ModTime().UnixNano() == cached.BinaryModTime is fragile; update the cache
validation to persist and compare a stronger signal (e.g., file size and a
content hash like SHA256) when saving and loading manifests: on save compute and
store BinarySize and BinaryHash alongside BinaryModTime, and on load recompute
the file size and SHA256 of the binary and reject the cache if either size or
hash differs (keep modtime as an additional hint but not the sole check); update
all places that compare cached.BinaryModTime (including the other check
referenced around lines 70–73) to use the new size+hash comparison (use the
existing cached struct fields or add BinarySize/BinaryHash fields).
| manifest, err := loaded.Client.GetManifest(context.Background()) | ||
| if err != nil { |
There was a problem hiding this comment.
Add a timeout to manifest RPC calls.
Both manifest fetches use context.Background() with no deadline, so a hung plugin process can stall CLI command setup indefinitely.
Suggested fix
import (
"context"
"encoding/json"
"fmt"
"strconv"
"strings"
+ "time"
@@
- manifest, err := loaded.Client.GetManifest(context.Background())
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ manifest, err := loaded.Client.GetManifest(ctx)
@@
- manifest, err := loaded.Client.GetManifest(context.Background())
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ manifest, err := loaded.Client.GetManifest(ctx)Also applies to: 307-309
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/plugin/cobra.go` around lines 20 - 21, The manifest RPCs currently call
loaded.Client.GetManifest with context.Background(), which can hang
indefinitely; update both call sites that invoke loaded.Client.GetManifest to
create a context with a timeout (use context.WithTimeout and a sensible
duration), pass that context into GetManifest, and defer the cancel() to ensure
the RPC is bounded; make the same change for any other plugin RPCs in the file
that use context.Background() (the second occurrence referenced) so all manifest
fetches use the cancellable timeout context instead of context.Background().
| // Cache miss: spawn plugin, get manifest, cache it | ||
| loaded, err := LoadPlugin(p.Name, binaryPath) | ||
| if err != nil { | ||
| logger.Debugf("Failed to load plugin %s@%s: %v", p.Name, version, err) | ||
| continue | ||
| } | ||
| loaded.Version = version | ||
| loaded.CompatibleWith = entry.CompatibleWith | ||
|
|
||
| manifest, err := loaded.Client.GetManifest(ctx) | ||
| if err != nil { | ||
| logger.Debugf("Failed to get manifest for %s@%s: %v", p.Name, version, err) | ||
| loaded.Kill() | ||
| continue | ||
| } | ||
|
|
||
| if err := SaveCachedManifest(pm.configDir, p.Name, version, manifest); err != nil { | ||
| logger.Debugf("Failed to cache manifest for %s@%s: %v", p.Name, version, err) | ||
| } | ||
|
|
||
| pm.loaded = append(pm.loaded, loaded) | ||
| logger.Debugf("Plugin %s@%s loaded and cached", p.Name, version) |
There was a problem hiding this comment.
Avoid retaining spawned plugin processes after manifest warm-up.
In the cache-miss path, the plugin is spawned to fetch manifest and then kept in pm.loaded, leaving unnecessary long-lived processes after startup. Prefer caching manifest, killing the process, and registering a LazyPluginClient.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/plugin/manager.go` around lines 64 - 85, The code currently spawns a
plugin via LoadPlugin to fetch the manifest and then keeps the spawned process
in pm.loaded (the variable loaded), causing unnecessary long-lived processes;
change the cache-miss path so after calling loaded.Client.GetManifest(ctx) and
SaveCachedManifest(pm.configDir, p.Name, version, manifest) you call
loaded.Kill() to terminate the spawned process and instead append a
LazyPluginClient (or equivalent lazy wrapper) to pm.loaded that holds the plugin
metadata (Version, CompatibleWith, manifest location) and will spawn the real
plugin on demand; update places that expect concrete plugin instances to accept
the lazy wrapper (refer to LoadPlugin, loaded.Client.GetManifest, loaded.Kill,
SaveCachedManifest, pm.loaded, and LazyPluginClient) so manifest warm-up only
caches data and does not retain processes.
| pluginDir := filepath.Join(PluginsDir(pm.configDir), name) | ||
| _ = os.RemoveAll(pluginDir) | ||
|
|
There was a problem hiding this comment.
Handle RemoveAll errors before mutating config.
Ignoring deletion errors can remove config while files remain on disk, leaving inconsistent state.
Proposed fix
- _ = os.RemoveAll(pluginDir)
+ if err := os.RemoveAll(pluginDir); err != nil {
+ return fmt.Errorf("failed to remove plugin files for %q: %w", name, err)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pluginDir := filepath.Join(PluginsDir(pm.configDir), name) | |
| _ = os.RemoveAll(pluginDir) | |
| pluginDir := filepath.Join(PluginsDir(pm.configDir), name) | |
| if err := os.RemoveAll(pluginDir); err != nil { | |
| return fmt.Errorf("failed to remove plugin files for %q: %w", name, err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/plugin/manager.go` around lines 182 - 184, The code currently ignores the
result of os.RemoveAll when deleting the plugin directory (pluginDir :=
filepath.Join(PluginsDir(pm.configDir), name); _ = os.RemoveAll(pluginDir));
change this to capture and handle the error from os.RemoveAll(pluginDir) and
abort or return the error before you mutate the plugin config/state.
Specifically, in the function that calls os.RemoveAll (referencing PluginsDir,
pm.configDir and name), replace the ignored assignment with proper error
checking: if os.RemoveAll returns an error, log/return that error (or wrap it)
so the config update that follows does not proceed when disk deletion failed.
| func NewRegistryClient(httpClient *http.Client) *RegistryClient { | ||
| if httpClient == nil { | ||
| httpClient = http.DefaultClient | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify unbounded HTTP client usage in plugin registry code.
rg -n --type=go 'http\.DefaultClient|\.Get\(' pkg/plugin/registry.goRepository: formancehq/fctl
Length of output: 184
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '70,140p' pkg/plugin/registry.go | cat -nRepository: formancehq/fctl
Length of output: 2818
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Call sites of NewRegistryClient =="
rg -n --type=go 'NewRegistryClient\s*\('
echo
echo "== http.Client{Timeout: ...} occurrences =="
rg -n --type=go 'http\.Client\s*{\s*[^}]*Timeout\s*:'
echo
echo "== Look for any context-aware requests around registry client usage =="
rg -n --type=go 'FetchRegistry\(|DownloadBinary\('Repository: formancehq/fctl
Length of output: 560
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Find definition of GetHttpClient =="
rg -n --type=go 'func\s+GetHttpClient\s*\('
rg -n --type=go 'GetHttpClient\('
echo
echo "== Search for Timeout: on http.Client initializations =="
rg -n --type=go 'http\.Client\s*{'
rg -n --type=go 'Timeout\s*:'Repository: formancehq/fctl
Length of output: 1030
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== pkg/http.go (key parts) =="
sed -n '1,220p' pkg/http.go | cat -n
echo
echo "== Any Timeout: usage in Go files =="
rg -n --type=go 'Timeout\s*:'
echo
echo "== Any http.Client{ usage =="
rg -n --type=go 'http\.Client\s*{'Repository: formancehq/fctl
Length of output: 4890
Add bounded timeouts for registry HTTP operations (current client has none).
pkg/plugin/registry.go uses r.HTTPClient.Get(...) for registry fetch + binary download, but callers pass fctl.GetHttpClient(cmd), which constructs &http.Client{Transport: ...} without a Timeout. A stalled network can therefore hang plugin install/discovery (also affects lines 103-105 and 122-124).
Set an explicit deadline via http.Client.Timeout (ideally set once in pkg/http.go:GetHttpClient or in NewRegistryClient when httpClient.Timeout == 0, preserving the existing Transport/round-trippers) and/or switch registry calls to context-aware requests.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/plugin/registry.go` around lines 92 - 95, The registry client currently
performs HTTP GETs (via r.HTTPClient.Get in RegistryClient and callers using
pkg/http.go:GetHttpClient) without a timeout; update NewRegistryClient to set a
sensible http.Client.Timeout when httpClient.Timeout == 0 (preserving
httpClient.Transport/round-tripper) so registry fetches and binary downloads
cannot hang, and optionally update registry call sites to use context-aware
requests (e.g., use http.NewRequestWithContext) for better cancellation control.
| func DetectServiceVersions(ctx context.Context, stackClient *formance.Formance) (ServiceVersions, error) { | ||
| resp, err := stackClient.GetVersions(ctx) | ||
| if err != nil { |
There was a problem hiding this comment.
Handle nil stackClient defensively.
Line 18 assumes stackClient is non-nil. A nil input will panic before returning an error.
Suggested patch
func DetectServiceVersions(ctx context.Context, stackClient *formance.Formance) (ServiceVersions, error) {
+ if stackClient == nil {
+ return nil, fmt.Errorf("stack client is nil")
+ }
resp, err := stackClient.GetVersions(ctx)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/plugin/versions.go` around lines 17 - 19, DetectServiceVersions currently
dereferences stackClient by calling stackClient.GetVersions which will panic if
stackClient is nil; add a defensive nil check at the start of
DetectServiceVersions to return a clear error when stackClient == nil (before
calling GetVersions). Update the function (DetectServiceVersions) to validate
stackClient, return an appropriate error value (and zero ServiceVersions) if
nil, and only call stackClient.GetVersions when the client is non-nil.
| func Serve(impl FctlPlugin) { | ||
| goplugin.Serve(&goplugin.ServeConfig{ | ||
| HandshakeConfig: HandshakeConfig, | ||
| Plugins: map[string]goplugin.Plugin{ | ||
| "fctl-plugin": &GRPCPlugin{Impl: impl}, | ||
| }, | ||
| GRPCServer: goplugin.DefaultGRPCServer, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Guard against nil plugin implementation in Serve.
If impl is nil, calls on Line 60/Line 68 dereference it at runtime. Add a fail-fast check in Serve for clearer failure.
Suggested patch
func Serve(impl FctlPlugin) {
+ if impl == nil {
+ panic("pluginsdk.Serve: impl must not be nil")
+ }
goplugin.Serve(&goplugin.ServeConfig{
HandshakeConfig: HandshakeConfig,
Plugins: map[string]goplugin.Plugin{
"fctl-plugin": &GRPCPlugin{Impl: impl},
},
GRPCServer: goplugin.DefaultGRPCServer,
})
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func Serve(impl FctlPlugin) { | |
| goplugin.Serve(&goplugin.ServeConfig{ | |
| HandshakeConfig: HandshakeConfig, | |
| Plugins: map[string]goplugin.Plugin{ | |
| "fctl-plugin": &GRPCPlugin{Impl: impl}, | |
| }, | |
| GRPCServer: goplugin.DefaultGRPCServer, | |
| }) | |
| } | |
| func Serve(impl FctlPlugin) { | |
| if impl == nil { | |
| panic("pluginsdk.Serve: impl must not be nil") | |
| } | |
| goplugin.Serve(&goplugin.ServeConfig{ | |
| HandshakeConfig: HandshakeConfig, | |
| Plugins: map[string]goplugin.Plugin{ | |
| "fctl-plugin": &GRPCPlugin{Impl: impl}, | |
| }, | |
| GRPCServer: goplugin.DefaultGRPCServer, | |
| }) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/pluginsdk/sdk.go` around lines 93 - 101, Serve currently accepts a nil
FctlPlugin and will later dereference it via GRPCPlugin{Impl: impl}; add a guard
at the start of Serve that checks if impl == nil and fail fast (panic or
log.Fatalf) with a clear message like "Serve called with nil FctlPlugin
implementation" so we never construct GRPCPlugin with a nil Impl or proceed to
goplugin.Serve; reference the Serve function, the FctlPlugin parameter, and the
GRPCPlugin{Impl: impl} usage when making the change.
Implements the plugin infrastructure for fctl as described in RFC 0002.
Ledger v2 built-in commands are untouched. The plugin system activates
only when a stack runs a service version not covered by built-in commands.
Plugin SDK (pkg/pluginsdk/):
- gRPC protocol with GetManifest + Execute RPCs
- FctlPlugin interface and Serve() entry point for plugin binaries
- DisplaySchema for centralized rendering
Plugin infrastructure (pkg/plugin/):
- Multi-version config with semver compatibleWith ranges
- YAML registry with derived binary URLs from convention
- Manifest caching to avoid spawning plugins at startup
- Lazy plugin loading (process spawned on first Execute, not GetManifest)
- Service version detection via stack /versions endpoint
- Resolution: plugin first, built-in fallback, auto-discovery
- Auto-discovery with interactive install prompt
- Centralized rendering from JSON + DisplaySchema
- Debug output with [plugin:{name}] prefixing on stderr
- Plugin crash attribution with captured stderr
CLI (cmd/plugin/, cmd/root.go):
- fctl plugin install (from registry or --path for local binary/Go module)
- fctl plugin list (multi-version display)
- fctl plugin remove
- --plugin-binary flag for ephemeral one-shot plugin loading
- PersistentPreRunE interceptor on service commands (ledger for POC)
Phase 9 (ledger plugin in product repo + registry) is separate.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7b86693 to
563fa4c
Compare
Summary
What's in this PR (Phases 1-8)
Plugin SDK (
pkg/pluginsdk/)GetManifest+ExecuteRPCsFctlPlugininterface andServe()for plugin binariesDisplaySchemaproto for centralized renderingPlugin infrastructure (
pkg/plugin/)compatibleWithranges{repo}/releases/download/v{version}/fctl-plugin-{name}-{os}-{arch})Execute, not onGetManifest/versionsendpoint to detect service versionsDisplaySchema(table/sections).rendered_textfallback for backward compat[plugin:{name}]prefixed stderr in debug modeCLI
fctl plugin install— from registry or--path(local binary / Go module directory)fctl plugin list— multi-version displayfctl plugin remove--plugin-binary name=/path— ephemeral one-shot plugin loading (dev workflow)Resolution flow
What's NOT in this PR (Phase 9)
formancehq/ledgeratcmd/fctl-plugin/)formancehq/poc-fctl-plugin-registry)Related
Test plan
go build ./...passesgo test ./pkg/plugin/...passes (17 tests: config, registry, cache, resolver, render)fctl plugin --helpshows install/list/removefctl plugin install --helpshows--pathand--versionflagsfctl ledger transactions listagainst Ledger v3 stack (Phase 9)🤖 Generated with Claude Code