SREP-347 & SREP-4896: factorizing code, fixing some potential race conditions in StreamLogs & making sure 'osdctl rhobs metrics' can work at a given time or with a time range#904
Conversation
|
@Nikokolas3270: This pull request references SREP-4896 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughSplits metrics into instant vs range paths with new query/print APIs, converts logs to callback-based queries, rewrites websocket streaming state, updates MCP tool calls and CLI flag/help text, and refreshes documentation. ChangesRHOBS CLI Metrics and Logs Refactoring
🎯 4 (Complex) | ⏱️ ~60 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (13 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Nikokolas3270 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cmd/rhobs/requests.go (1)
1037-1098:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
startTimeis still shared unsafely across goroutines.
createWebsocketreads the capturedstartTime, butcloseWebsocketmutates it and can run from the signal goroutine throughcloseWebSocketInContext. That leaves a real data race on the reconnect cursor and can duplicate or skip logs after reconnect. Moving the cursor intostreamLogsContextand guarding it with the same mutex would close that gap.Also applies to: 1112-1144, 1156-1167
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/rhobs/requests.go` around lines 1037 - 1098, The shared variable startTime is captured by createWebsocket and mutated by closeWebsocket (invoked via closeWebSocketInContext), causing a data race; move startTime out of the outer scope into the streamLogsContext struct (or the same object that already holds the reconnect state and mutex), update createWebsocket and closeWebsocket to read/write that field instead of the captured variable, and protect all accesses with the existing mutex used by streamLogsContext so reads/writes are serialized; also update the other occurrences mentioned (blocks around lines 1112-1144 and 1156-1167) to use the guarded field rather than the outer startTime.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/rhobs/mcp_tools.go`:
- Around line 163-181: The MCP output schema still advertises a singular
"result" and marks "results"/"count" as instant-only, but the code paths in
QueryRangeMetrics (fetcher.QueryRangeMetrics) and QueryInstantMetrics
(fetcher.QueryInstantMetrics) populate cellInfo with "results" and "count" for
range queries too; update the MCP tool schema to replace the deprecated "result"
object with "results" (array/time-series) and mark "results" and "count" as
valid for both range and instant payloads so the advertised contract matches the
actual output emitted by the code (ensure any schema/README/struct that
describes the MCP response mirrors the keys set on cellInfo).
In `@cmd/rhobs/metricsCmd.go`:
- Around line 24-26: The help text claims the PromQL expression is optional when
--web, but the command still enforces a required positional arg; update the code
in the metrics command (metricsCmd / NewMetricsCmd) to either (A) relax arg
validation: in the command's Args or RunE check, read the "web" flag
(cmd.Flags().GetBool("web")) and allow zero positional args when web==true,
otherwise require one, or (B) adjust the help string lines that mention
"optional if the --web option is set" to remove that claim so documentation
matches the current required-positional-arg behavior; pick one approach and make
the corresponding change.
- Around line 39-55: After computing startTime/endTime (using nowTime, duration
and cmd.Flags().Changed), validate the time range and fail fast: ensure that
when --since is used the computed startTime (nowTime.Add(-duration)) is not in
the future (startTime <= nowTime) and when --end-time is set ensure startTime <=
endTime; if either check fails return a clear error (e.g. "--since results in
start time in the future" or "--start-time must be <= --end-time"). Add these
checks immediately after the existing startTime/endTime logic (the block using
cmd.Flags().Changed, startTime, endTime, duration, nowTime) so invalid ranges
are rejected before calling RHOBS.
In `@cmd/rhobs/requests.go`:
- Around line 1046-1047: After calling createLogsPrinter(...) and immediately
invoking logsPrinter.PrintHeader(), add a deferred call to
logsPrinter.PrintTrailer() (defer logsPrinter.PrintTrailer()) so the streaming
printer is always finalized even if websocket setup fails; then remove any
explicit/manual calls to logsPrinter.PrintTrailer() in the read-error /
error-return paths (to avoid double-closing). Apply this change for each place
where logsPrinter.PrintHeader() is used (the blocks that create logsPrinter,
call PrintHeader, and currently call PrintTrailer on error).
- Around line 595-602: The print helper functions (e.g., PrintInstantMetrics)
create context.TODO() and thus drop caller cancellation; update these APIs to
accept a context.Context parameter and thread it into calls like
QueryInstantMetrics and the printer constructors (e.g.,
createInstantMetricsPrinter) so that remote requests use the caller-provided
ctx; change signatures of the other similar wrappers (the ones at the other
occurrences) to accept ctx and pass it through to their underlying RHOBS queries
and printers to allow Cobra Ctrl+C/timeouts to propagate.
- Around line 567-568: The code currently sets evalTimeStr from
evalTime.UnixNano() and assigns it to evalTimePStr, which sends nanoseconds to
the Prometheus instant-query `time` parameter; change this to use seconds or
RFC3339 instead. In the function handling the instant query (look for
evalTimeStr / evalTimePStr and where evalTime is computed), replace
evalTime.UnixNano() with either evalTime.Unix() (or a fractional seconds string
like fmt.Sprintf("%d.%09d", evalTime.Unix(), evalTime.Nanosecond()) if fractions
are needed) or use evalTime.Format(time.RFC3339Nano) to produce an RFC3339
timestamp, then set evalTimeStr to that value and keep assigning it to
evalTimePStr.
In `@docs/osdctl_rhobs_logs.md`:
- Line 7: Synopsis text incorrectly references the --contain flag for regexp
filtering; update the documentation sentence in docs/osdctl_rhobs_logs.md so it
names the actual flag --contain-regex (i.e., change "regexp (--contain option)"
to "regexp (--contain-regex option)") so the synopsis matches the defined CLI
options.
In `@docs/osdctl_rhobs_metrics.md`:
- Line 7: The synopsis text in docs/osdctl_rhobs_metrics.md refers to a --web
option but that flag is not defined in the command options; update the docs so
they match: either remove the reference to "--web" from the synopsis sentence
about the expression being optional, or add a documented "--web" flag entry
(with its short/long form, description and behavior) to the options list for the
osdctl rhobs metrics command so the synopsis and options are consistent; ensure
the edited text around the sentence "The prometheus expression ... it is
optional if the --web option is set" is updated accordingly.
In `@docs/README.md`:
- Line 4450: The synopsis mentions a --web option ("the prometheus expression
... is optional if the --web option is set") but that flag is not present in the
command's options; either add documentation for the --web flag to the command's
options/inherited options or remove/replace the --web mention in the synopsis so
they match; update the sentence beginning "By default, the command will try to
evaluate..." and the command's options list in the same README section so the
synopsis and exposed flags are consistent.
- Line 4342: Update the README synopsis to correctly document the regex filter:
replace or supplement the mention of --contain with the correct flag
--contain-regex (or explicitly state that --contain is literal substring match
while --contain-regex accepts regular expressions) so the synopsis matches the
detailed flag descriptions; ensure both flags (--contain and --contain-regex)
are referenced unambiguously in the synopsis text to avoid user confusion.
---
Outside diff comments:
In `@cmd/rhobs/requests.go`:
- Around line 1037-1098: The shared variable startTime is captured by
createWebsocket and mutated by closeWebsocket (invoked via
closeWebSocketInContext), causing a data race; move startTime out of the outer
scope into the streamLogsContext struct (or the same object that already holds
the reconnect state and mutex), update createWebsocket and closeWebsocket to
read/write that field instead of the captured variable, and protect all accesses
with the existing mutex used by streamLogsContext so reads/writes are
serialized; also update the other occurrences mentioned (blocks around lines
1112-1144 and 1156-1167) to use the guarded field rather than the outer
startTime.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 24a4b96a-11b9-4ad9-8bec-8bc7adfd4dc5
📒 Files selected for processing (9)
cmd/rhobs/cellCmd.gocmd/rhobs/logsCmd.gocmd/rhobs/mcp_query.gocmd/rhobs/mcp_tools.gocmd/rhobs/metricsCmd.gocmd/rhobs/requests.godocs/README.mddocs/osdctl_rhobs_logs.mddocs/osdctl_rhobs_metrics.md
💤 Files with no reviewable changes (1)
- cmd/rhobs/mcp_query.go
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
cmd/rhobs/requests.go (1)
1101-1104: 💤 Low valueVariable
contextshadows thecontextpackage import.Naming this variable
contextshadows the standard library package. While the package isn't directly used withinStreamLogs, this can cause confusion and issues if future changes need to usecontext.Contexttypes. Consider renaming tostreamCtxorloopCtx.Suggested rename
var mutex sync.Mutex - context := streamLogsContext{ + streamCtx := streamLogsContext{ isLooping: true, }Then update all references:
getContext()body,closeWebSocketInContext()body, andstoreNewWebsocketInContext()body to usestreamCtx.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/rhobs/requests.go` around lines 1101 - 1104, The local variable named "context" in StreamLogs shadows the standard library "context" package; rename that variable to something like "streamCtx" (or "loopCtx") wherever it's declared (the streamLogsContext instance) and update all usages and helper functions that access it—specifically update getContext(), closeWebSocketInContext(), and storeNewWebsocketInContext() to accept/use "streamCtx" instead of "context" so no package name is shadowed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/osdctl_rhobs_metrics.md`:
- Line 7: The synopsis text for the "osdctl rhobs metrics" command has a missing
space ("--since options.Results") in the generated help string; update the
source that constructs the command help/synopsis (the code that builds the rhobs
metrics command help, e.g., the New/Build function for the rhobs metrics CLI
command such as NewRHOBSMetricsCommand or rhobsMetricsCmd) to insert a space
between "--since options." and "Results" in the synopsis string, then regenerate
the docs under docs/ so the markdown artifact reflects the corrected sentence.
In `@docs/README.md`:
- Line 4450: The metrics synopsis sentence in the generated README ("Fetch
metrics from RHOBS... --start-time, --end-time and --since options.Results can
be filtered...") is missing a space after a period; fix it in the doc
generator/source (not the generated docs) by updating the source/template that
emits the string "Fetch metrics from RHOBS for a given cluster..." (search for
that exact sentence or the metrics command help text in the docs generator or
openshift/osdctl source) so it emits "options. Results" (or otherwise ensures
sentences are joined with a trailing space), and add a simple unit test or
template trim rule to prevent future concatenation issues.
---
Nitpick comments:
In `@cmd/rhobs/requests.go`:
- Around line 1101-1104: The local variable named "context" in StreamLogs
shadows the standard library "context" package; rename that variable to
something like "streamCtx" (or "loopCtx") wherever it's declared (the
streamLogsContext instance) and update all usages and helper functions that
access it—specifically update getContext(), closeWebSocketInContext(), and
storeNewWebsocketInContext() to accept/use "streamCtx" instead of "context" so
no package name is shadowed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0c0fc742-dbf1-4247-b6a9-fc2fc4aff4bc
📒 Files selected for processing (9)
cmd/rhobs/cellCmd.gocmd/rhobs/logsCmd.gocmd/rhobs/mcp_query.gocmd/rhobs/mcp_tools.gocmd/rhobs/metricsCmd.gocmd/rhobs/requests.godocs/README.mddocs/osdctl_rhobs_logs.mddocs/osdctl_rhobs_metrics.md
💤 Files with no reviewable changes (1)
- cmd/rhobs/mcp_query.go
✅ Files skipped from review due to trivial changes (1)
- docs/osdctl_rhobs_logs.md
🚧 Files skipped from review as they are similar to previous changes (4)
- cmd/rhobs/cellCmd.go
- cmd/rhobs/logsCmd.go
- cmd/rhobs/mcp_tools.go
- cmd/rhobs/metricsCmd.go
| } | ||
|
|
||
| func (f *RhobsFetcher) StreamLogs(lokiExpr string, format LogsFormat, isPrintingTimeValue bool, fieldNames []string) error { | ||
| startTime := time.Now().Add(-5 * time.Minute) |
There was a problem hiding this comment.
startTime is still shared across goroutines without mutex protection. closeWebsocket writes it (startTime = time.Now()) from the signal goroutine, while createWebsocket reads it from the main loop. Moving it into streamLogsContext and guarding it with the same mutex would close the race.
There was a problem hiding this comment.
Thanks for pointing that out.
startTime is now read and updated by the same goroutine, all the time.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/README.md`:
- Line 4453: Update the command help string in cmd/rhobs/metricsCmd.go so it
matches the enforced argument contract: change the Use value from "metrics
[PromQL-expression]" to "metrics <PromQL-expression>" (or otherwise remove the
optional brackets) to reflect that cobra.ExactArgs(1) requires a single required
argument; ensure the updated Use text is what generates docs like docs/README.md
and docs/osdctl_rhobs_metrics.md so help and behavior are consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cee9deba-fcd9-4320-9f3d-062f4044825f
📒 Files selected for processing (9)
cmd/rhobs/cellCmd.gocmd/rhobs/logsCmd.gocmd/rhobs/mcp_query.gocmd/rhobs/mcp_tools.gocmd/rhobs/metricsCmd.gocmd/rhobs/requests.godocs/README.mddocs/osdctl_rhobs_logs.mddocs/osdctl_rhobs_metrics.md
💤 Files with no reviewable changes (1)
- cmd/rhobs/mcp_query.go
✅ Files skipped from review due to trivial changes (1)
- docs/osdctl_rhobs_logs.md
🚧 Files skipped from review as they are similar to previous changes (5)
- cmd/rhobs/cellCmd.go
- cmd/rhobs/mcp_tools.go
- cmd/rhobs/metricsCmd.go
- cmd/rhobs/logsCmd.go
- cmd/rhobs/requests.go
…nditions in StreamLogs & making sure 'osdctl rhobs metrics' can work at a given time or with a time range
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
cmd/rhobs/metricsCmd.go (1)
26-28:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRestore the missing space in the long help text.
Line 26 concatenates directly into Line 27, so
--helpand the generated docs renderoptions.Resultsas one merged sentence.✏️ Proposed fix
Long: "Fetch metrics from RHOBS for a given cluster. " + "The cluster can be a hosted cluster (HCP), a management cluster (MC) or whatever cluster sending metrics to RHOBS. " + "The prometheus expression provided as an argument can be either an instant query or a range query. " + "By default, the command will try to evaluate the expression as an instant query at the current time, " + - "but it is possible to specify a different evaluation time using the --time option or a time range using the --start-time, --end-time and --since options." + + "but it is possible to specify a different evaluation time using the --time option or a time range using the --start-time, --end-time and --since options. " + "Results can be filtered to only keep the ones matching the given cluster (--cluster-id option) with the --filter option " + "even if it is more efficient to do that filtering at the prometheus expression level.",Based on learnings:
docs/**/*.mdare generated artifacts, so this should be fixed in the command help source rather than the generated markdown.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/rhobs/metricsCmd.go` around lines 26 - 28, The long help text string for the metrics command concatenation merges sentences (producing "options.Results"); update the string concatenation in the metrics command's Long/help value (e.g., the metrics command variable or its Long field in metricsCmd) to include a space between the two quoted parts—either add a trailing space at the end of the first string or a leading space at the start of the second string so the rendered help shows "...options. Results..." correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/rhobs/metricsCmd.go`:
- Around line 57-61: When validating flags in metricsCmd.go, reject non-positive
--step values before calling RHOBS: when cmd.Flags().Changed("step") is true,
retrieve the step value using cmd.Flags().GetDuration("step") (or the existing
step variable) and check if it's <= 0; if so return an error like "--step must
be a positive duration" (mirror the existing --since validation style). Keep
this check alongside the existing requirement that --start-time or --since must
be set so negative/zero durations are caught locally rather than deferred to the
backend.
---
Duplicate comments:
In `@cmd/rhobs/metricsCmd.go`:
- Around line 26-28: The long help text string for the metrics command
concatenation merges sentences (producing "options.Results"); update the string
concatenation in the metrics command's Long/help value (e.g., the metrics
command variable or its Long field in metricsCmd) to include a space between the
two quoted parts—either add a trailing space at the end of the first string or a
leading space at the start of the second string so the rendered help shows
"...options. Results..." correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7857e4a2-ea2a-4f8a-b3dc-3c921bde0ca7
📒 Files selected for processing (9)
cmd/rhobs/cellCmd.gocmd/rhobs/logsCmd.gocmd/rhobs/mcp_query.gocmd/rhobs/mcp_tools.gocmd/rhobs/metricsCmd.gocmd/rhobs/requests.godocs/README.mddocs/osdctl_rhobs_logs.mddocs/osdctl_rhobs_metrics.md
💤 Files with no reviewable changes (1)
- cmd/rhobs/mcp_query.go
✅ Files skipped from review due to trivial changes (1)
- docs/osdctl_rhobs_logs.md
🚧 Files skipped from review as they are similar to previous changes (4)
- cmd/rhobs/cellCmd.go
- cmd/rhobs/logsCmd.go
- cmd/rhobs/mcp_tools.go
- cmd/rhobs/requests.go
| if cmd.Flags().Changed("step") { | ||
| if !cmd.Flags().Changed("start-time") && !cmd.Flags().Changed("since") { | ||
| return fmt.Errorf("--step can only be set if --start-time or --since is set") | ||
| } | ||
| } |
There was a problem hiding this comment.
Reject non-positive --step values before calling RHOBS.
--step=0s and negative durations currently pass local validation and defer the failure to the backend. This flag defines the sampling interval, so it should be validated the same way --since is.
🛡️ Proposed fix
if cmd.Flags().Changed("step") {
if !cmd.Flags().Changed("start-time") && !cmd.Flags().Changed("since") {
return fmt.Errorf("--step can only be set if --start-time or --since is set")
}
+ if stepDuration <= 0 {
+ return fmt.Errorf("--step must be greater than 0")
+ }
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/rhobs/metricsCmd.go` around lines 57 - 61, When validating flags in
metricsCmd.go, reject non-positive --step values before calling RHOBS: when
cmd.Flags().Changed("step") is true, retrieve the step value using
cmd.Flags().GetDuration("step") (or the existing step variable) and check if
it's <= 0; if so return an error like "--step must be a positive duration"
(mirror the existing --since validation style). Keep this check alongside the
existing requirement that --start-time or --since must be set so negative/zero
durations are caught locally rather than deferred to the backend.
|
@Nikokolas3270: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Follow up of this PR:
#876
osdctl rhobs logs --follow--timeoption) or with a time range (new--start-timeand--sinceoptions)Summary by CodeRabbit
New Features
Bug Fixes
Documentation