Add PostgreSQL observability telemetry exposure via ServiceMonitors#1808
Add PostgreSQL observability telemetry exposure via ServiceMonitors#1808DmytroPI-dev wants to merge 4 commits intofeature/database-controllersfrom
Conversation
a1b796f to
976ecd1
Compare
| } | ||
|
|
||
| // Reconcile Connection Pooler. | ||
| poolerEnabled = mergedConfig.Spec.ConnectionPoolerEnabled != nil && *mergedConfig.Spec.ConnectionPoolerEnabled |
There was a problem hiding this comment.
nit: should we also use here your isConnectionPoolerMetricsEnabled to not have same logic to maintain in 2 places?
There was a problem hiding this comment.
This line, as I understand, decides whether the poolers themselves should exist. On contrary, isConnectionPoolerMetricsEnabled(...) answers a different question: whether pooler metrics should be exposed, which depends on both:
- if poolers are enabled
- if observability is enabled for PgBouncer
| rc.emitPoolerReadyTransition(postgresCluster, oldConditions) | ||
| } | ||
|
|
||
| if err := reconcilePostgreSQLMetricsService(ctx, c, rc.Scheme, postgresCluster, isPostgreSQLMetricsEnabled(postgresCluster, clusterClass)); err != nil { |
There was a problem hiding this comment.
I'm just wondering if placing it after Configmap is created and Everything is ready? At the very end of reconciliation function? That way we would make sure that all setup is ready before we start scraping metrics. It sounds also more natural to me, what are your thoughts?
There was a problem hiding this comment.
I don't think so, as this will not give us any benefit, because the metrics resources do not depend on the CM at all. In our case, CM is just a connection metadata for consumers, moving observability after CM would make the reconcile flow less clear:
- cluster and poolers are the producers of metrics
- observability resources belong with those producers
- ConfigMap is a separate output artifact
| // metrics | ||
| postgresMetricsServiceSuffix = "-postgres-metrics" | ||
| postgresMetricsPortName = "metrics" | ||
| postgresMetricsPort = int32(9187) |
There was a problem hiding this comment.
is it a standard port for metrics? I believe splunk operator service monitor uses a different one, maybe we should stay consistent?
There was a problem hiding this comment.
Controller-manager uses https:8443, but I assume we should use 9127 and 9187, as per docs.
| } | ||
|
|
||
| // Build desired CNPG Cluster spec. | ||
| desiredSpec := buildCNPGClusterSpec(mergedConfig, postgresSecretName) |
There was a problem hiding this comment.
In buildCNPGClusterSpec() does not set Spec.Monitoring on the CNPG Cluster object. Same for buildCNPGPooler(). Is it our decision or just overlook?
Maybe that would give us some interesting metrics we could use:
https://cloudnative-pg.io/docs/1.29/cloudnative-pg.v1#monitoringconfiguration
There was a problem hiding this comment.
CNPG already exposes standard PostgreSQL metrics on port 9187 (metrics), so Cluster.spec.monitoring is not needed for basic scraping, I assume. It might be useful for use later for extra control such as custom queries, disabling default queries, query TTL, or TLS on the metrics endpoint.
Description
Adds PostgreSQL observability telemetry exposure for
PostgresClusterwith operator-managed metricsServices and PrometheusServiceMonitors for PostgreSQL and PgBouncer.Key Changes
api/v4/postgresclusterclass_types.goAdded class-level observability configuration for PostgreSQL and PgBouncer metrics.
api/v4/postgrescluster_types.goAdded cluster-level disable-only observability overrides.
pkg/postgresql/cluster/core/cluster.goWired PostgreSQL and PgBouncer metrics
ServiceandServiceMonitorreconciliation into thePostgresClusterflow.Made
ServiceMonitorpresence required by failing reconciliation when the CRD is unavailable.pkg/postgresql/cluster/core/monitoring.goAdded feature resolution helpers.
Added builders and reconcilers for PostgreSQL/PgBouncer metrics
Services.Added builders and reconcilers for PostgreSQL/PgBouncer
ServiceMonitors.internal/controller/postgrescluster_controller.goAdded RBAC for
monitoring.coreos.com/servicemonitors.cmd/main.goRegistered Prometheus Operator
monitoring/v1types in the manager scheme.internal/controller/suite_test.goRegistered Prometheus Operator
monitoring/v1types in the test scheme.pkg/postgresql/cluster/core/monitoring_unit_test.goAdded unit tests for observability flag resolution and monitoring resource builders.
Testing and Verification
Added unit tests in
pkg/postgresql/cluster/core/monitoring_unit_test.gofor:ServicebuildersServiceMonitorbuildersRelated Issues
CPI-1853 - related JIRA ticket.
PR Checklist