Skip to content

remove observedGeneration from postgresDatabase#1812

Draft
DmytroPI-dev wants to merge 5 commits intofeature/database-controllersfrom
check-observed-generation-usage-database-CR
Draft

remove observedGeneration from postgresDatabase#1812
DmytroPI-dev wants to merge 5 commits intofeature/database-controllersfrom
check-observed-generation-usage-database-CR

Conversation

@DmytroPI-dev
Copy link
Copy Markdown

@DmytroPI-dev DmytroPI-dev commented Apr 2, 2026

Description

This PR aligns PostgresDatabase with the watch-driven reconciliation approach used for PostgresCluster. It removes the reconcile skip based on ObservedGeneration, reacts to owned Secret/ConfigMap drift even when the spec is unchanged, repairs configmap drift declaratively, and degrades status instead of recreating previously provisioned managed secrets that were deleted.

Key Changes

  • postgresdatabase_controller.go add explicit drift-trigger predicates for owned Secret, ConfigMap, and relevant CNPG Database changes. Conflict retries are requeued without surfacing noisy reconcile errors.
  • database.go remove the ObservedGeneration short-circuit and revalidated stages on every reconcile. Add drift handling for managed user secrets, configmap repair/re-adoption, retained-resource adoption, conflict-friendly database reconcile handling, and quieter ready/not-found paths.
  • events.go Add events for drift and retained-resource adoption.
  • cluster.go reduced log noise on transient status-update conflicts.

Testing and Verification

  • Update controller tests in postgresdatabase_controller_test.go for:
    • configmap drift repair
    • deleted configmap recreation
    • deleted managed secret degradation without recreation
    • secret ownership re-adoption
    • adding a new database to an already provisioned resource
    • secondary-resource watch predicates
    • cluster-missing and role-conflict regressions
  • Add unit tests in database_unit_test.go for secret/configmap drift and adoption paths.
  • Verified behavior from controller logs/events during delete/recreate flows with both Delete and Retain policies.
  • Automated tests were updated. All tests passed

Related Issues

CPI-1950

PR Checklist

  • Code changes adhere to the project's coding standards.
  • Relevant unit and integration tests are included.
  • Documentation has been updated accordingly.
  • All tests pass locally.
  • The PR description follows the project's guidelines.

predicate.GenerationChangedPredicate{},
predicate.Funcs{
UpdateFunc: func(e event.UpdateEvent) bool {
oldObj, okOld := e.ObjectOld.(*cnpgv1.Database)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the difference between this predicate and GenerationChangedPredicate?

Copy link
Copy Markdown
Author

@DmytroPI-dev DmytroPI-dev Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding is correct, GenerationChangedPredicate is triggered only when CNPG Database spec changes. If we have status-only update, generation might not always change. Owner reference changes on the owned CNPG Database would also stop triggering reconciliation.

}

func expectReadyStatus(current *enterprisev4.PostgresDatabase, generation int64, expectedDatabase enterprisev4.DatabaseInfo) {
func expectReadyStatus(current *enterprisev4.PostgresDatabase, expectedDatabase enterprisev4.DatabaseInfo) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still leave generation in status. Reason - we will know what is the CRD version status was emited for

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returned this back.

c := rc.Client
logger := log.FromContext(ctx).WithValues("postgresDatabase", postgresDB.Name)
ctx = log.IntoContext(ctx, logger)
defer func() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to do defer here? We always catch the errors explicitly close to the called function so what benefit this brings?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove defer, add explicit requeue on conflict.

err = nil
}()
logger.Info("Reconciling PostgresDatabase")
wasReady := postgresDB.Status.Phase != nil && *postgresDB.Status.Phase == string(readyDBPhase)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what it does and how we use it?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to reduce EventPostgresDatabaseReady on every successful reconcile.

EventClusterNotReady = "ClusterNotReady"
EventRoleConflict = "RoleConflict"
EventUserSecretsFailed = "UserSecretsFailed"
EventUserSecretsDriftDetected = "UserSecretsDriftDetected"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL: we should start using roles instead of users

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

}

func reconcileUserSecrets(ctx context.Context, c client.Client, scheme *runtime.Scheme, postgresDB *enterprisev4.PostgresDatabase) error {
func reconcileUserSecrets(ctx context.Context, c client.Client, scheme *runtime.Scheme, postgresDB *enterprisev4.PostgresDatabase, existingDatabases map[string]struct{}) error {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function should not be bothered with logic checking what db is already provisioned, it should be clean

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed logic checking.

}

func ensureSecret(ctx context.Context, c client.Client, scheme *runtime.Scheme, postgresDB *enterprisev4.PostgresDatabase, roleName, secretName string) error {
func ensureSecret(ctx context.Context, c client.Client, scheme *runtime.Scheme, postgresDB *enterprisev4.PostgresDatabase, roleName, secretName string, databaseAlreadyProvisioned bool) error {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function should not be responsible for checking if db is already provisioned

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed logic checking.

}
}

func validateManagedSecret(secret *corev1.Secret, roleName string) error {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we need this in the first place? What would be the use-case for example username not matching roleName?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent adopting secret in case of manual changes, for example. Does this makes sense?

@DmytroPI-dev DmytroPI-dev force-pushed the check-observed-generation-usage-database-CR branch from 0b9acab to c333f1e Compare April 8, 2026 11:05
Expect(current.Status.Databases[0].AdminUserSecretRef).NotTo(BeNil())
Expect(current.Status.Databases[0].RWUserSecretRef).NotTo(BeNil())
Expect(current.Status.Databases[0].ConfigMapRef).NotTo(BeNil())
Expect(current.Status.ObservedGeneration).NotTo(BeNil())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that supposed to be removed from the general logic?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we removed observedGeneration to act as a database readiness check, but left it to have an idea know what is the CRD version status was emitted for. See here

case ClusterReady:
rc.emitOnConditionTransition(postgresDB, postgresDB.Status.Conditions, clusterReady, EventClusterValidated, "Referenced PostgresCluster is ready")
if err := updateStatus(clusterReady, metav1.ConditionTrue, reasonClusterAvailable, "Cluster is operational", provisioningDBPhase); err != nil {
if result, conflictErr, ok := requeueOnConflict(err, "persisting cluster ready status"); ok {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly, adding this logic is a replacement for observed generation?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If asking about this:

if result, conflictErr, ok := requeueOnConflict(err, "persisting cluster ready status"); ok {
	return result, conflictErr
}
  • this is to catch errors on version conflicts and requeue immediately to soothe reconciliation storm.

if result, conflictErr, ok := requeueOnConflict(err, "reconciling user secrets"); ok {
return result, conflictErr
}
var driftErr *secretReconcileError
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure, that It will always be drift related?
What I'm after is having It coupled to the name secretReconcileError, which seems quite generic. Even If the underlaying struct could be verbose and drift only in values.

Copy link
Copy Markdown
Author

@DmytroPI-dev DmytroPI-dev Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now, we have reason: reasonSecretsDriftDetected (see here and here ), but this makes sense in general, will rename it as secretErr

updateStatus := func(conditionType conditionTypes, conditionStatus metav1.ConditionStatus, reason conditionReasons, message string, phase reconcileDBPhases) error {
return persistStatus(ctx, c, postgresDB, conditionType, conditionStatus, reason, message, phase)
}
requeueOnConflict := func(err error, action string) (ctrl.Result, error, bool) {
Copy link
Copy Markdown

@M4KIF M4KIF Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more of a nit: Will having a lamda created here not cause to miss scope for us. Ie. If we have k8s conflict error occuring in a few places. Will the action log be sufficient and fully satisfying our business needs? Also, there could be some metrics done in the future from reconcilation misses(conflicts) which could prove useful for system admins.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add typed conflict categories instead of lambda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants