Skip to content

Add PostgreSQL observability telemetry exposure via ServiceMonitors#1808

Draft
DmytroPI-dev wants to merge 4 commits intofeature/database-controllersfrom
postgres-operator-monitoring
Draft

Add PostgreSQL observability telemetry exposure via ServiceMonitors#1808
DmytroPI-dev wants to merge 4 commits intofeature/database-controllersfrom
postgres-operator-monitoring

Conversation

@DmytroPI-dev
Copy link
Copy Markdown

@DmytroPI-dev DmytroPI-dev commented Apr 1, 2026

Description

Adds PostgreSQL observability telemetry exposure for PostgresCluster with operator-managed metrics Services and Prometheus ServiceMonitors for PostgreSQL and PgBouncer.

Key Changes

api/v4/postgresclusterclass_types.go
Added class-level observability configuration for PostgreSQL and PgBouncer metrics.

api/v4/postgrescluster_types.go
Added cluster-level disable-only observability overrides.

pkg/postgresql/cluster/core/cluster.go
Wired PostgreSQL and PgBouncer metrics Service and ServiceMonitor reconciliation into the PostgresCluster flow.
Made ServiceMonitor presence required by failing reconciliation when the CRD is unavailable.

pkg/postgresql/cluster/core/monitoring.go
Added feature resolution helpers.
Added builders and reconcilers for PostgreSQL/PgBouncer metrics Services.
Added builders and reconcilers for PostgreSQL/PgBouncer ServiceMonitors.

internal/controller/postgrescluster_controller.go
Added RBAC for monitoring.coreos.com/servicemonitors.

cmd/main.go
Registered Prometheus Operator monitoring/v1 types in the manager scheme.

internal/controller/suite_test.go
Registered Prometheus Operator monitoring/v1 types in the test scheme.

pkg/postgresql/cluster/core/monitoring_unit_test.go
Added unit tests for observability flag resolution and monitoring resource builders.

Testing and Verification

Added unit tests in pkg/postgresql/cluster/core/monitoring_unit_test.go for:

  • class/cluster observability enablement logic
  • PostgreSQL and PgBouncer metrics Service builders
  • PostgreSQL and PgBouncer ServiceMonitor builders

Related Issues

CPI-1853 - related JIRA ticket.

PR Checklist

  • Code changes adhere to the project's coding standards.
  • Relevant unit and integration tests are included.
  • Documentation has been updated accordingly.
  • All tests pass locally.
  • The PR description follows the project's guidelines.

@DmytroPI-dev DmytroPI-dev force-pushed the postgres-operator-monitoring branch from a1b796f to 976ecd1 Compare April 2, 2026 14:08
@DmytroPI-dev DmytroPI-dev changed the title Create ServiceMonitor and basic Grafana dashboard for metrics Add PostgreSQL observability telemetry exposure via ServiceMonitors Apr 2, 2026
}

// Reconcile Connection Pooler.
poolerEnabled = mergedConfig.Spec.ConnectionPoolerEnabled != nil && *mergedConfig.Spec.ConnectionPoolerEnabled
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we also use here your isConnectionPoolerMetricsEnabled to not have same logic to maintain in 2 places?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line, as I understand, decides whether the poolers themselves should exist. On contrary, isConnectionPoolerMetricsEnabled(...) answers a different question: whether pooler metrics should be exposed, which depends on both:

  • if poolers are enabled
  • if observability is enabled for PgBouncer

rc.emitPoolerReadyTransition(postgresCluster, oldConditions)
}

if err := reconcilePostgreSQLMetricsService(ctx, c, rc.Scheme, postgresCluster, isPostgreSQLMetricsEnabled(postgresCluster, clusterClass)); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering if placing it after Configmap is created and Everything is ready? At the very end of reconciliation function? That way we would make sure that all setup is ready before we start scraping metrics. It sounds also more natural to me, what are your thoughts?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, as this will not give us any benefit, because the metrics resources do not depend on the CM at all. In our case, CM is just a connection metadata for consumers, moving observability after CM would make the reconcile flow less clear:

  • cluster and poolers are the producers of metrics
  • observability resources belong with those producers
  • ConfigMap is a separate output artifact

// metrics
postgresMetricsServiceSuffix = "-postgres-metrics"
postgresMetricsPortName = "metrics"
postgresMetricsPort = int32(9187)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a standard port for metrics? I believe splunk operator service monitor uses a different one, maybe we should stay consistent?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Controller-manager uses https:8443, but I assume we should use 9127 and 9187, as per docs.

}

// Build desired CNPG Cluster spec.
desiredSpec := buildCNPGClusterSpec(mergedConfig, postgresSecretName)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In buildCNPGClusterSpec() does not set Spec.Monitoring on the CNPG Cluster object. Same for buildCNPGPooler(). Is it our decision or just overlook?

Maybe that would give us some interesting metrics we could use:
https://cloudnative-pg.io/docs/1.29/cloudnative-pg.v1#monitoringconfiguration

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNPG already exposes standard PostgreSQL metrics on port 9187 (metrics), so Cluster.spec.monitoring is not needed for basic scraping, I assume. It might be useful for use later for extra control such as custom queries, disabling default queries, query TTL, or TLS on the metrics endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants