Simplify google_cloud_ops_agent_engine CLI and systemd services to be OTel-only#2314
Merged
Merged
Conversation
6d0de30 to
1b9834b
Compare
1b9834b to
1f387c4
Compare
6e44c08 to
a1dfa18
Compare
jinghan-ma
reviewed
May 28, 2026
Contributor
jinghan-ma
left a comment
There was a problem hiding this comment.
We should also get rid of thigs like services used in run_windows.go:191, since there's just going to be one service. But we can do that in a separate PR as well.
jinghan-ma
approved these changes
May 28, 2026
jinghan-ma
approved these changes
May 28, 2026
ea87d00 to
16ad5dd
Compare
… OTel-only This finishes simplification step #4. With Fluent Bit completely removed, the config engine CLI only needs to generate OTel configurations. We have consolidated the config generation and systemd services into a single, unified startup orchestration on Linux: - Removed the -service flag from cmd/google_cloud_ops_agent_engine/main.go. - Simplified confgenerator.GenerateFilesFromConfig to always generate otel.yaml. - Deleted the obsolete google-cloud-ops-agent-opentelemetry-collector.service unit. - Configured google-cloud-ops-agent.service (Type=simple) to validate the configuration, run health checks, and launch the OTel collector directly via ExecStart. - Updated internal/healthchecks/ports_check.go to check if google-cloud-ops-agent is active. - Aligned expected agent services, diagnostics paths, and systemctl calls in integration_test/agents/agents.go. - Updated TestPortsAndAPIHealthChecks to write systemd overrides for google-cloud-ops-agent.service.d. TAG=agy BUG=b/517494318 CONV=a3aefa50-102a-4eb8-ac21-894088d8c5df
16ad5dd to
0a30d9e
Compare
Use a dedicated http.Client with a robust 5-second timeout during API and GCE metadata checks instead of relying on Go's default timeout-free http.Get client. When egress firewall traffic is denied (as in TestNetworkHealthCheck), the legacy client would hang indefinitely waiting for TCP dial handshakes, causing systemd startup limits (90s) to kill the main ExecStartPre engine process and fail integration tests across all distros. TAG=agy BUG=b/517494318
…bility Include t.Skip on TestPortsAndAPIHealthChecks and TestParsingFailureCheck pending the future OTel self-log collection implementation (b/517541093). Hardcode standard directory paths (/run/google-cloud-ops-agent) inside the consolidated systemd service unit ExecStartPre and ExecStart fields, restoring 100% backward-compatibility with older systemd versions like SLES 12 (v228) which do not dynamically inject env variables (b/517494318). TAG=agy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
With Fluent Bit completely removed from Ops Agent 3.0, the config engine CLI only needs to generate OTel configurations. We have consolidated the config generation and systemd services into a single, unified startup orchestration:
-serviceflag fromcmd/google_cloud_ops_agent_engine/main.go.confgenerator.GenerateFilesFromConfiginconfgenerator/files.goto always generateotel.yaml.RuntimeDirectory,StateDirectory, andLogsDirectoryingoogle-cloud-ops-agent.service(main unit).google-cloud-ops-agent-opentelemetry-collector.serviceto directly load the generated/run/google-cloud-ops-agent/otel.yamlconfiguration without runningExecStartPre.Related issue
b/517494318
How has this been tested?
Checklist: