Skip to content

Metrics: use broker to collect metrics and expose failed status#161

Open
elisaado wants to merge 13 commits into
nlewo:mainfrom
elisaado:main
Open

Metrics: use broker to collect metrics and expose failed status#161
elisaado wants to merge 13 commits into
nlewo:mainfrom
elisaado:main

Conversation

@elisaado
Copy link
Copy Markdown

This PR removes metrics from the manager and creates a broker subscriber that listens to incoming messages on the channel and exposes metrics from them. Furthermore, it exposes 4 new metrics: last_fetch_failed, last_eval_failed, last_build_failed, and last_deployment_failed which allows for alerting when any of those stages fail.

This was coded late at night so it may contain some dumb mistakes

Design decisions:

  • I opted for failed states instead of successful states since upon starting Comin, for most people there will be no successful eval/build/deployment until there has been an explicit fetch and push, this would trigger alerts if alerting is set to alert on e.g. "no successful builds for 10 minutes". The failed states allow for alerts like "failed build for the last 10 minutes"
  • I reused the internal/prometheus/prometheus.go for the broker subscribe since it already colocates all metrics things
  • Subscribe uses a go (func())() which I find kinda ugly but is used across the codebase (e.g. in server/server.go)

Copy link
Copy Markdown
Owner

@nlewo nlewo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like deploymentInfo and buildInfo metrics are no longer updated. Could you ensure there is no breaking changes regarding existing metrics?

Comment thread internal/prometheus/prometheus.go Outdated
Comment thread internal/prometheus/prometheus.go Outdated
Comment thread internal/prometheus/prometheus.go Outdated
@nlewo
Copy link
Copy Markdown
Owner

nlewo commented May 25, 2026

Subscribe uses a go (func())() which I find kinda ugly but is used across the codebase (e.g. in server/server.go)

I'm wondering why do you find this ugly?

@elisaado
Copy link
Copy Markdown
Author

(rebased main)

@elisaado
Copy link
Copy Markdown
Author

It looks like deploymentInfo and buildInfo metrics are no longer updated. Could you ensure there is no breaking changes regarding existing metrics?

Oops. Will fix.

I'm wondering why do you find this ugly?

I like defining separate functions for the goroutine and then go that_function() where needed (e.g. in main). This shows whoever is reading main that a function is non blocking, without having to go to that function definition.

@elisaado
Copy link
Copy Markdown
Author

Regarding the buildInfo metric: this was only ever updated on startup in cmd.go (and still is) :)

When comin is updated & restarts it will get set from cmd.go again. We could centralize everything related to metrics in the prometheus broker event handler, but then an event for "comin started" would need to be added (or does it already exist?)

@elisaado elisaado requested a review from nlewo May 29, 2026 18:04
@nlewo
Copy link
Copy Markdown
Owner

nlewo commented May 31, 2026

I like defining separate functions for the goroutine and then go that_function() where needed (e.g. in main). This shows whoever is reading main that a function is non blocking, without having to go to that function definition.

That's a fair point and I would have preferred to do it like this ;) I will keep this in mind for the future.

Regarding the buildInfo metric: this was only ever updated on startup in cmd.go (and still is) :)

Thank you for the remark. Could be addressed in a followup PR (now tracked by #165).

When comin is updated & restarts it will get set from cmd.go again. We could centralize everything related to metrics in the prometheus broker event handler, but then an event for "comin started" would need to be added (or does it already exist?)

In the context of the current PR, this is good enough.

But I agree it would be nice to publish on event once comin is initialized. Currently, for the watch command, i send a ManagerState event when comin starts (i don't like the name of this event). Maybe we could add an new event Initialized that publish the same payload. This event would be published once comin is ready. This could also be useful for external tooling as a readiness probe.

@nlewo
Copy link
Copy Markdown
Owner

nlewo commented May 31, 2026

Actually, the buildInfo metric doesn't need to be updated since it describes the version of comin ;)

@elisaado
Copy link
Copy Markdown
Author

re: the failing pipeline:

       > internal/prometheus/prometheus.go:165:6: func boolToString is unused (unused)
       > func boolToString(b bool) string {
       >      ^
       > 1 issues:
       > * unused: 1

the function boolToString was already there, it's now unused but may be nice to have in the future, should I remove it for now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants